AutoNote is a desktop application that automatically generates comprehensive study notes from Canvas LMS lecture materials and videos. It downloads slides and videos, transcribes audio, aligns the transcript to slide pages, and produces Markdown note files per lecture using an LLM.
- Launch the AutoNote AppImage / installer. On first launch, a setup wizard guides you through installing the ML environment.
- Settings → enter Canvas URL and Canvas token → Save All → Refresh Courses
- Settings → enter at least one LLM API key (OpenAI / Anthropic / Gemini), or use Claude CLI if you have Claude Code installed (no API key needed)
- Pipeline → select course → click Run Pipeline
The app has seven pages, accessible from the left navigation rail:
| Page | Purpose |
|---|---|
| Dashboard | Course overview: video / caption / alignment counts per course; click a course to view per-video status and delete generated files |
| Pipeline | One-click full pipeline wizard with pipelined execution |
| Download | Fine-grained control over video and material downloads |
| Transcribe | Transcribe downloaded videos to timestamped captions |
| Align | Map caption segments to specific slide pages |
| Generate Notes | Generate the final Markdown study notes |
| Settings | API keys, ML environment, model selection, tunable constants |
Every page has a terminal panel pinned to the bottom that streams live output. The Stop button cancels the running process at any time.
On first launch, AutoNote checks if the required ML environment is installed. If not, a setup wizard appears with:
- Required components (pre-checked, cannot be skipped): Core Python packages for the pipeline
- Optional components (pre-checked, can be unchecked):
- Local transcription (faster-whisper + GPU)
- Local embeddings (sentence-transformers + FAISS)
- BGE-M3 model for high-quality video-slide matching
- Panopto video download (Playwright browser)
Click Install and Continue to install everything, or Skip for now to configure later in Settings.
Open Settings and fill in the Connection card:
| Field | Value |
|---|---|
| Canvas URL | Your institution's Canvas domain (e.g. canvas.nus.edu.sg). |
| Panopto Host | Panopto video host (e.g. mediaweb.ap.panopto.com). Leave blank — it is auto-detected. |
| Output Dir | Directory where all pipeline files are stored (default: ~/AutoNote). |
Fill in the API Keys & Credentials card:
| Field | Required for |
|---|---|
| Canvas Token | Downloading materials and listing videos. Get it from Canvas → Account → Settings → New Access Token. |
| OpenAI API Key | Note generation with OpenAI models (gpt-5.1, gpt-4.1, o3, ...). |
| Anthropic API Key | Note generation with Claude models via API. |
| Gemini API Key | Note generation with Gemini models; also enables the Google text-embedding-004 option for video↔slide matching. |
| Jina API Key | Optional: enables the jina-embeddings-v4 remote option (text + multimodal) for video↔slide matching. |
Alternatively, if you have Claude Code installed on your computer, you can select Claude CLI (local) as the note generation model — no API key needed.
Click Save All, then Refresh Courses.
GPU note: Whisper large-v3 and sentence-transformer embeddings run on GPU. A CUDA-capable GPU with >= 8 GB VRAM is recommended. The app works on CPU but transcription will be much slower.
Go to Pipeline, select a course from the dropdown, and click Run Pipeline.
| # | Step | What it does |
|---|---|---|
| 1 | Download materials | Downloads all lecture slides, PDFs, and other files from Canvas |
| 2 | Download videos | Downloads all Panopto lecture recordings (MP4) |
| 3 | Transcribe + Align | Transcribes videos, extracts frames from screen recordings, and aligns transcripts to slides. Uses pipelined threading: while video N+1 is transcribing, video N's frames are extracted concurrently. |
| 4 | Generate study notes | Sends slides + aligned transcripts to an LLM to generate one Markdown note file per lecture |
Each step only runs if the previous step succeeded. You can uncheck any step to skip it.
| Option | Description |
|---|---|
| Slack mode | Adds random delays between downloads to avoid rate-limiting. |
| Course name | Name that appears in the final notes file (auto-filled when you select a course). |
| Language | Language for generated notes: English or Chinese. Selectable per-run — notes are generated in English first, then translated. Changing language auto-regenerates cached sections. |
| Detail level | Controls note verbosity (0-2 outline, 3-5 bullets, 6-8 paragraphs, 9-10 exhaustive). |
| Lecture filter | Process only specific lectures, e.g. 1-5 or 1,3,5. Leave blank for all. |
| Force regenerate | Re-run the selected pipeline steps even if output files already exist. Without this, only missing files are processed. |
For screen-share videos (slide recordings), the pipeline automatically:
- Classifies the video as screen vs camera using heuristics
- Extracts unique frames via scene detection + perceptual hash deduplication
- Picks the most informative frame per slide page (handles incremental reveals)
- Builds timestamp-based transcript-to-frame alignment
Camera recordings use traditional slide-based alignment with PDF/PPTX files.
For pairing each transcript to its slide file, the pipeline uses an embedding model picked by python semantic_alignment.py --suggest-matches --match-model <MODEL>:
| Model | Type | Requires | Get key / model |
|---|---|---|---|
bge-m3 (default) |
Local (BAAI/bge-m3, ~2.3 GB) |
NVIDIA GPU ≥ 6 GB VRAM recommended (CUDA 11.8+); runs on CPU at ~10× slower. Auto-downloaded from huggingface.co/BAAI/bge-m3 on first use | — |
mpnet |
Local (all-mpnet-base-v2, ~420 MB) |
Any GPU ≥ 2 GB VRAM or CPU (acceptable speed). Auto-downloaded from huggingface.co/sentence-transformers/all-mpnet-base-v2 | — |
jina |
Remote (jina-embeddings-v4) |
JINA_API_KEY env var or jina_api.txt in data dir |
jina.ai/api-dashboard (free tier available) |
google |
Remote (text-embedding-004) |
GEMINI_API_KEY / GOOGLE_API_KEY env var or gemini_api.txt in data dir |
aistudio.google.com/apikey (free tier available) |
GPU guidance for local models. Whisper large-v3 transcription dominates VRAM usage (~10 GB); embedding models fit alongside on a 12–16 GB card (RTX 3060 12 GB, 4070, 4080, etc.). With only 6–8 GB VRAM, prefer
mpnetoverbge-m3, or switch to a remote option so the GPU is free for Whisper. On CPU, remote (jina/
Other API keys mentioned above. OpenAI: platform.openai.com/api-keys. Anthropic: console.anthropic.com/settings/keys. Canvas: Canvas → Account → Settings → New Access Token.
Remote options let the matcher run without any local ML environment — the google model reuses the same Gemini key used for note generation.
Click any course card to open a detail modal showing all transcribed videos with their processing status:
- Aligned / No align: whether the video has been aligned to slides
- Notes / No notes: whether note sections exist for this video
- Delete button: removes the transcript, alignment, note sections, per-video notes, and rendered images for that video
Generates one Markdown note file per lecture (multi-part lectures are merged into a single file).
| Option | Description |
|---|---|
| Course name | Used as the note file name and title. Auto-filled when you select a course. |
| Language | English or Chinese — selectable per-run. |
| Lecture filter | Generate notes for specific lectures only. |
| Detail level | 0-10 slider. Default: 7. |
| Force regenerate | Re-generate all sections. Without this, cached sections are reused. |
| Merge-only | Re-run the merge pass without re-generating any sections. |
| Iterative mode | Automatically raises detail level until quality target is met. |
<Output Dir>/<course_id>/notes/
├── CS2105_L01_notes.md Per-lecture note files
├── CS2105_L02_notes.md
├── sections/
│ ├── L01_S01.md Cached per-chunk sections (resumable)
│ ├── L01_S02.md
│ └── .language Language marker for cache invalidation
└── images/
├── L01/ Rendered slide images
└── L02/
Each note file includes source metadata at the top:
- **Video**: CS2105 Lecture on 1_23_2026 (Fri)
- **Slides**: Lecture 1 - Introduction.pdf
Notes are resumable: re-running skips sections already cached on disk. Only missing sections are generated. Truncated responses (LLM token limit) are detected and not cached, so they auto-regenerate on the next run.
| Provider | Models |
|---|---|
| OpenAI | gpt-5.1, gpt-5.2, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3, o4-mini |
| Anthropic | Claude Opus 4.6, Sonnet 4.6, Sonnet 4.5, Haiku 4.5 |
| Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.0 Flash | |
| DeepSeek | DeepSeek V3, DeepSeek R1 |
| xAI | Grok 3, Grok 3 mini |
| Mistral | Mistral Large, Medium, Small, Codestral |
| Claude CLI | Uses local claude -p (no API key needed, requires Claude Code installed) |
Use --model on the CLI to override: python note_generation.py --model claude-cli --language zh
| Scenario | What to do |
|---|---|
| New lecture added to Canvas | Re-run the pipeline — only new sections are generated |
| Slides updated for a lecture | Generate Notes → Lecture filter + Force regenerate for that lecture |
| Want to change note language | Select the new language — cached sections auto-regenerate when the language differs |
| Want to delete bad notes | Dashboard → click course card → click Delete next to the video |
| Pipeline stopped partway | Just run again — completed steps are skipped |
All pipeline output lives under the configured Output Dir (~/AutoNote by default):
~/AutoNote/
├── manifest.json Video download state
├── <course_id>/
│ ├── videos/ Downloaded MP4 recordings
│ ├── materials/ Downloaded slides, PDFs
│ ├── captions/ Whisper transcripts (JSON)
│ ├── frames/ Extracted screen recording frames
│ ├── alignment/ Alignment JSON per slide file
│ └── notes/ Generated notes + images
└── download_log.json Material download log
App configuration and ML environment:
~/.auto_note/
├── config.json Canvas URL, Panopto host, output dir
├── canvas_token.txt Canvas API token
├── openai_api.txt OpenAI API key
├── anthropic_key.txt Anthropic API key
├── gemini_api.txt Gemini API key
├── scripts/ Pipeline scripts (installed from AppImage)
└── venv/ ML virtual environment
Two installer formats are available:
- NSIS (.exe) — traditional installer
- MSIX (.appx) — modern Windows package (less SmartScreen friction)
When uninstalling from Windows Settings, a dialog asks what to keep:
- Generated notes and downloads (kept by default)
- Settings and API keys (kept by default)
- ML environment ~2 GB (deleted by default)
"No courses loaded" on the Dashboard → Go to Settings, enter Canvas URL and Canvas token, click Save All, then Refresh Courses.
Video list shows 0 videos for a course → The Panopto host has not been detected yet. Click List videos once; it is auto-detected. If it still fails, enter the Panopto domain manually in Settings.
Transcription is very slow → Running on CPU. Ensure CUDA GPU is present and PyTorch was installed with GPU support. Use Settings → ML Environment → Reinstall if needed.
ModuleNotFoundError when running the pipeline → The ML environment is missing packages. Go to Settings → ML Environment → Reinstall.
Notes are in the wrong language → Select the correct language from the Pipeline or Generate page dropdown. If cached sections exist in the old language, they are automatically regenerated.
Notes have truncated sections → The LLM hit its token limit. Truncated sections are not cached, so re-running will regenerate them. If it persists, try a model with a larger output window.