A powerful CLI tool that automatically splits PDF and EPUB files into chapter-wise sections with smart OCR detection. Perfect for learners using AI-powered study assistants.
Learning from books becomes significantly more effective when combined with AI study assistants. This tool enables you to:
- Feed individual chapters to AI tutors for focused discussions
- Use ChatGPT's Study Mode on specific sections
- Leverage Gemini's Guided Learning with curated content
- Create notebooks in NotebookLM chapter by chapter
- Generate summaries and quizzes per chapter
π Full Book (PDF/EPUB)
β
βΌ
βββββββββββββββββββββββββ
β PDF Splitter Tool β
β βββ Pre-text β β Intro materials
β βββ Chapter 01.pdf β β AI Study Session 1
β βββ Chapter 02.pdf β β AI Study Session 2
β βββ Chapter 03.pdf β β AI Study Session 3
β βββ Post-text β β Appendices/References
βββββββββββββββββββββββββ
β
βΌ
π€ AI Learning Tools
βββ ChatGPT Study Mode
βββ Gemini Guided Learning
βββ NotebookLM
βββ Custom AI Tutors
ChatGPT Study Mode was launched by OpenAI on July 29, 2025. It offers step-by-step guidance instead of quick answers, helping you learn and retain knowledge better.
Upload individual chapter PDFs to ChatGPT and ask:
"Analyze this chapter about [topic] and create:
1. Key concepts summary
2. 5 quiz questions
3. Real-world examples
4. Connections to previous chapters"
Key Features:
- Guided learning with questions, hints, and step-by-step explanations
- Personalized support that adapts to your skill level
- Knowledge checks with quizzes and open-ended questions
- Progress tracking to show mastery and areas to focus
Learn more: chatgpt.com/features/study-mode
Google launched Guided Learning in Gemini on August 6, 2025. It acts as a personal learning companion that helps you build deep understanding of subjects.
Feed chapters sequentially:
"Guide me through this chapter using the Feynman technique.
Start with the main thesis, then break down complex concepts,
and end with practical applications."
Key Features:
- Interactive study partner for deeper understanding
- Uploads course material, debug code, or understand concepts
- Visual responses and interactive study aids
- Personalized explanations adapting to your needs
Learn more: blog.google/products/gemini/guided-learning
Google's NotebookLM continues to evolve with powerful AI learning features updated in 2025.
2025 Features:
- Audio Overviews - Turn sources into engaging "Deep Dive" discussions
- Video Overviews - Generate video summaries of your documents
- Flashcards & Quizzes - Create from your documents instantly
- Learning Guide - Generate tailored study guides
- Mind Maps - Visualize connections between concepts
- Presentations - Create polished outlines with talking points
How to Use with Split Chapters:
NotebookLM is used through the web interface at notebooklm.google. Here's how to use it with this tool's output:
-
Split your book:
python -m pdfsplitter.cli "book.pdf" -
Open NotebookLM at notebooklm.google
-
Click "Upload" and select the chapter PDFs from the output folder
-
Use NotebookLM features:
- Click "Audio Overview" to create AI audio discussions
- Use "Guide" to generate study guides
- Ask questions about specific chapters
Example Workflow:
Output folder: book_output/
βββ chapter_01.pdf β Upload to NotebookLM
βββ chapter_02.pdf β Upload to NotebookLM
βββ chapter_03.pdf β Upload to NotebookLM
βββ ...
NotebookLM will create Audio Overviews, summaries, and answer questions
about your content using Gemini's AI capabilities.
Note: NotebookLM doesn't have a public Python API for creating notebooks programmatically. The free version is used through the web interface. For enterprise/automated use, Google Cloud offers NotebookLM Enterprise APIs.
- Automatic Chapter Detection - Uses TOC and text analysis
- Smart OCR Decision - LLM determines if scanned PDFs need OCR
- Pre/Post Text Separation - Isolates front matter and appendices
- Multiple Format Support - Handles both PDF and EPUB
- CLI Interface - Simple, command-line based
- Caching - Remembers OCR decisions for speed
# Clone the repository
git clone https://github.com/streetquant/pdf-splitter.git
cd pdf-splitter
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
.\venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Set up environment variables
cp .env.example .env
# Edit .env and add your OPENROUTER_API_KEYCreate a .env file with your API key:
OPENROUTER_API_KEY=your-api-key-here
MODEL_NAME=nvidia/nemotron-3-nano-30b-a3b:freeGet a free API key from OpenRouter.
# Process a PDF
python -m pdfsplitter.cli book.pdf
# Process an EPUB
python -m pdfsplitter.cli textbook.epub
# Specify output directory
python -m pdfsplitter.cli book.pdf -o ./my-chapters
# Verbose mode
python -m pdfsplitter.cli book.pdf -vbook_output/
βββ metadata.json # Chapter information
βββ pretext.pdf # Front matter (TOC, preface)
βββ chapter_01.pdf # Chapter 1
βββ chapter_02.pdf # Chapter 2
βββ chapter_03.pdf # Chapter 3
βββ ...
βββ posttext.pdf # Appendices, references
$ python -m pdfsplitter.cli "Deep Learning.pdf"
β Processing complete!
Input: Deep Learning.pdf
Output: Deep Learning_output
Chapters: 12
01. Pre-text
02. Chapter 1: Math Foundations
03. Chapter 2: Neural Networks
...
12. Post-textPrompt for ChatGPT Study Mode:
I'm studying Chapter 3 on Neural Networks from a deep learning book.
Please:
1. Explain the key concepts in simple terms
2. Create a concept map connecting to Chapter 2
3. Generate 5 practice problems
4. Suggest real-world applications
5. Recommend which sections to reread
Prompt for Gemini Guided Learning:
Using the Feynman technique, help me understand this chapter:
- Start with the main thesis
- Break down 3 complex concepts
- End with practical applications in real-world scenarios
NotebookLM Integration:
# Upload chapters to NotebookLM for comprehensive learning
# Features: Audio Overviews, Flashcards, Mind Maps, QuizzesCombine chapters across multiple AI tools for comprehensive coverage:
| Tool | Best For | Chapter Usage |
|---|---|---|
| ChatGPT Study Mode | Interactive Q&A, step-by-step explanations | Upload 1-2 chapters per session |
| Gemini Guided Learning | Personalized learning paths | Sequential chapter progression |
| NotebookLM | Audio summaries, flashcards, research | Upload entire book |
The tool detects chapters using:
CHAPTER 1,Chapter 1,CHAPTER ONEPart I,Part 1- Table of Contents entries
Posttext detection includes:
APPENDIX,REFERENCES,BIBLIOGRAPHYINDEX,GLOSSARY,ACKNOWLEDGMENTS
from pdfsplitter.core import split_pdf, split_epub
# Programmatic usage
result = split_pdf("book.pdf", "./output")
for chapter in result.chapters:
print(f"{chapter.title}: pages {chapter.start_page}-{chapter.end_page}")# Run all tests
pytest tests/ -v
# Run specific test
pytest tests/test_pdf_processor.py -vpdf-splitter/
βββ src/pdfsplitter/
β βββ cli.py # Command-line interface
β βββ config.py # Configuration
β βββ constants.py # Patterns and constants
β βββ core/
β β βββ pdf_processor.py # PDF splitting logic
β β βββ epub_processor.py # EPUB splitting logic
β β βββ ocr_detector.py # Smart OCR detection
β β βββ models.py # Data models
β βββ utils/
β βββ llm.py # OpenRouter integration
β βββ cache.py # Caching
β βββ logging.py # Logging
βββ tests/ # Test suite
βββ pyproject.toml # Project config
βββ requirements.txt # Dependencies
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
MIT License - feel free to use in your projects.
- PyMuPDF - PDF processing
- ebooklib - EPUB handling
- OpenRouter - LLM API access
- PyPDF2 - PDF manipulation
Happy Learning! ππ€