AutoNote

AutoNote is a desktop application that automatically generates comprehensive study notes from Canvas LMS lecture materials and videos. It downloads slides and videos, transcribes audio, aligns the transcript to slide pages, and produces Markdown note files per lecture using an LLM.

Quick start

Launch the AutoNote AppImage / installer. On first launch, a setup wizard guides you through installing the ML environment.
Settings → enter Canvas URL and Canvas token → Save All → Refresh Courses
Settings → enter at least one LLM API key (OpenAI / Anthropic / Gemini), or use Claude CLI if you have Claude Code installed (no API key needed)
Pipeline → select course → click Run Pipeline

Interface overview

The app has seven pages, accessible from the left navigation rail:

Page	Purpose
Dashboard	Course overview: video / caption / alignment counts per course; click a course to view per-video status and delete generated files
Pipeline	One-click full pipeline wizard with pipelined execution
Download	Fine-grained control over video and material downloads
Transcribe	Transcribe downloaded videos to timestamped captions
Align	Map caption segments to specific slide pages
Generate Notes	Generate the final Markdown study notes
Settings	API keys, ML environment, model selection, tunable constants

Every page has a terminal panel pinned to the bottom that streams live output. The Stop button cancels the running process at any time.

First-time setup

Setup wizard (automatic)

On first launch, AutoNote checks if the required ML environment is installed. If not, a setup wizard appears with:

Required components (pre-checked, cannot be skipped): Core Python packages for the pipeline
Optional components (pre-checked, can be unchecked):
- Local transcription (faster-whisper + GPU)
- Local embeddings (sentence-transformers + FAISS)
- BGE-M3 model for high-quality video-slide matching
- Panopto video download (Playwright browser)

Click Install and Continue to install everything, or Skip for now to configure later in Settings.

Connection settings

Open Settings and fill in the Connection card:

Field	Value
Canvas URL	Your institution's Canvas domain (e.g. `canvas.nus.edu.sg`).
Panopto Host	Panopto video host (e.g. `mediaweb.ap.panopto.com`). Leave blank — it is auto-detected.
Output Dir	Directory where all pipeline files are stored (default: `~/AutoNote`).

API keys

Fill in the API Keys & Credentials card:

Field	Required for
Canvas Token	Downloading materials and listing videos. Get it from Canvas → Account → Settings → New Access Token.
OpenAI API Key	Note generation with OpenAI models (gpt-5.1, gpt-4.1, o3, ...).
Anthropic API Key	Note generation with Claude models via API.
Gemini API Key	Note generation with Gemini models; also enables the Google `text-embedding-004` option for video↔slide matching.
Jina API Key	Optional: enables the `jina-embeddings-v4` remote option (text + multimodal) for video↔slide matching.

Alternatively, if you have Claude Code installed on your computer, you can select Claude CLI (local) as the note generation model — no API key needed.

Click Save All, then Refresh Courses.

GPU note: Whisper large-v3 and sentence-transformer embeddings run on GPU. A CUDA-capable GPU with >= 8 GB VRAM is recommended. The app works on CPU but transcription will be much slower.

Running the full pipeline

Go to Pipeline, select a course from the dropdown, and click Run Pipeline.

Pipeline steps

#	Step	What it does
1	Download materials	Downloads all lecture slides, PDFs, and other files from Canvas
2	Download videos	Downloads all Panopto lecture recordings (MP4)
3	Transcribe + Align	Transcribes videos, extracts frames from screen recordings, and aligns transcripts to slides. Uses pipelined threading: while video N+1 is transcribing, video N's frames are extracted concurrently.
4	Generate study notes	Sends slides + aligned transcripts to an LLM to generate one Markdown note file per lecture

Each step only runs if the previous step succeeded. You can uncheck any step to skip it.

Pipeline options

Option	Description
Slack mode	Adds random delays between downloads to avoid rate-limiting.
Course name	Name that appears in the final notes file (auto-filled when you select a course).
Language	Language for generated notes: English or Chinese. Selectable per-run — notes are generated in English first, then translated. Changing language auto-regenerates cached sections.
Detail level	Controls note verbosity (0-2 outline, 3-5 bullets, 6-8 paragraphs, 9-10 exhaustive).
Lecture filter	Process only specific lectures, e.g. `1-5` or `1,3,5`. Leave blank for all.
Force regenerate	Re-run the selected pipeline steps even if output files already exist. Without this, only missing files are processed.

Screen recording detection

For screen-share videos (slide recordings), the pipeline automatically:

Classifies the video as screen vs camera using heuristics
Extracts unique frames via scene detection + perceptual hash deduplication
Picks the most informative frame per slide page (handles incremental reveals)
Builds timestamp-based transcript-to-frame alignment

Camera recordings use traditional slide-based alignment with PDF/PPTX files.

Video↔slide matching models

For pairing each transcript to its slide file, the pipeline uses an embedding model picked by python semantic_alignment.py --suggest-matches --match-model <MODEL>:

Model	Type	Requires	Get key / model
`bge-m3` (default)	Local (`BAAI/bge-m3`, ~2.3 GB)	NVIDIA GPU ≥ 6 GB VRAM recommended (CUDA 11.8+); runs on CPU at ~10× slower. Auto-downloaded from huggingface.co/BAAI/bge-m3 on first use	—
`mpnet`	Local (`all-mpnet-base-v2`, ~420 MB)	Any GPU ≥ 2 GB VRAM or CPU (acceptable speed). Auto-downloaded from huggingface.co/sentence-transformers/all-mpnet-base-v2	—
`jina`	Remote (`jina-embeddings-v4`)	`JINA_API_KEY` env var or `jina_api.txt` in data dir	jina.ai/api-dashboard (free tier available)
`google`	Remote (`text-embedding-004`)	`GEMINI_API_KEY` / `GOOGLE_API_KEY` env var or `gemini_api.txt` in data dir	aistudio.google.com/apikey (free tier available)

GPU guidance for local models. Whisper large-v3 transcription dominates VRAM usage (~10 GB); embedding models fit alongside on a 12–16 GB card (RTX 3060 12 GB, 4070, 4080, etc.). With only 6–8 GB VRAM, prefer mpnet over bge-m3, or switch to a remote option so the GPU is free for Whisper. On CPU, remote (jina / google) is almost always faster than local embeddings.

Other API keys mentioned above. OpenAI: platform.openai.com/api-keys. Anthropic: console.anthropic.com/settings/keys. Canvas: Canvas → Account → Settings → New Access Token.

Remote options let the matcher run without any local ML environment — the google model reuses the same Gemini key used for note generation.

Dashboard

Click any course card to open a detail modal showing all transcribed videos with their processing status:

Aligned / No align: whether the video has been aligned to slides
Notes / No notes: whether note sections exist for this video
Delete button: removes the transcript, alignment, note sections, per-video notes, and rendered images for that video

Generate Notes page

Generates one Markdown note file per lecture (multi-part lectures are merged into a single file).

Options

Option	Description
Course name	Used as the note file name and title. Auto-filled when you select a course.
Language	English or Chinese — selectable per-run.
Lecture filter	Generate notes for specific lectures only.
Detail level	0-10 slider. Default: 7.
Force regenerate	Re-generate all sections. Without this, cached sections are reused.
Merge-only	Re-run the merge pass without re-generating any sections.
Iterative mode	Automatically raises detail level until quality target is met.

Output structure

<Output Dir>/<course_id>/notes/
├── CS2105_L01_notes.md       Per-lecture note files
├── CS2105_L02_notes.md
├── sections/
│   ├── L01_S01.md            Cached per-chunk sections (resumable)
│   ├── L01_S02.md
│   └── .language             Language marker for cache invalidation
└── images/
    ├── L01/                  Rendered slide images
    └── L02/

Each note file includes source metadata at the top:

- **Video**: CS2105 Lecture on 1_23_2026 (Fri)
- **Slides**: Lecture 1 - Introduction.pdf

Notes are resumable: re-running skips sections already cached on disk. Only missing sections are generated. Truncated responses (LLM token limit) are detected and not cached, so they auto-regenerate on the next run.

Supported LLM models

Provider	Models
OpenAI	gpt-5.1, gpt-5.2, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3, o4-mini
Anthropic	Claude Opus 4.6, Sonnet 4.6, Sonnet 4.5, Haiku 4.5
Google	Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.0 Flash
DeepSeek	DeepSeek V3, DeepSeek R1
xAI	Grok 3, Grok 3 mini
Mistral	Mistral Large, Medium, Small, Codestral
Claude CLI	Uses local `claude -p` (no API key needed, requires Claude Code installed)

Use --model on the CLI to override: python note_generation.py --model claude-cli --language zh

Incremental updates

Scenario	What to do
New lecture added to Canvas	Re-run the pipeline — only new sections are generated
Slides updated for a lecture	Generate Notes → Lecture filter + Force regenerate for that lecture
Want to change note language	Select the new language — cached sections auto-regenerate when the language differs
Want to delete bad notes	Dashboard → click course card → click Delete next to the video
Pipeline stopped partway	Just run again — completed steps are skipped

File layout

All pipeline output lives under the configured Output Dir (~/AutoNote by default):

~/AutoNote/
├── manifest.json                    Video download state
├── <course_id>/
│   ├── videos/                      Downloaded MP4 recordings
│   ├── materials/                   Downloaded slides, PDFs
│   ├── captions/                    Whisper transcripts (JSON)
│   ├── frames/                      Extracted screen recording frames
│   ├── alignment/                   Alignment JSON per slide file
│   └── notes/                       Generated notes + images
└── download_log.json                Material download log

App configuration and ML environment:

~/.auto_note/
├── config.json          Canvas URL, Panopto host, output dir
├── canvas_token.txt     Canvas API token
├── openai_api.txt       OpenAI API key
├── anthropic_key.txt    Anthropic API key
├── gemini_api.txt       Gemini API key
├── scripts/             Pipeline scripts (installed from AppImage)
└── venv/                ML virtual environment

Windows installation

Two installer formats are available:

NSIS (.exe) — traditional installer
MSIX (.appx) — modern Windows package (less SmartScreen friction)

When uninstalling from Windows Settings, a dialog asks what to keep:

Generated notes and downloads (kept by default)
Settings and API keys (kept by default)
ML environment ~2 GB (deleted by default)

Troubleshooting

"No courses loaded" on the Dashboard → Go to Settings, enter Canvas URL and Canvas token, click Save All, then Refresh Courses.

Video list shows 0 videos for a course → The Panopto host has not been detected yet. Click List videos once; it is auto-detected. If it still fails, enter the Panopto domain manually in Settings.

Transcription is very slow → Running on CPU. Ensure CUDA GPU is present and PyTorch was installed with GPU support. Use Settings → ML Environment → Reinstall if needed.

ModuleNotFoundError when running the pipeline → The ML environment is missing packages. Go to Settings → ML Environment → Reinstall.

Notes are in the wrong language → Select the correct language from the Pipeline or Generate page dropdown. If cached sections exist in the old language, they are automatically regenerated.

Notes have truncated sections → The LLM hit its token limit. Truncated sections are not cached, so re-running will regenerate them. If it persists, try a model with a larger output window.

Name		Name	Last commit message	Last commit date
Latest commit History 212 Commits
.github/workflows		.github/workflows
assets		assets
electron		electron
test		test
.gitignore		.gitignore
README.md		README.md
alignment_parser.py		alignment_parser.py
arch_design.md		arch_design.md
benchmark.py		benchmark.py
build.spec		build.spec
downloader.py		downloader.py
extract_caption.py		extract_caption.py
frame_extractor.py		frame_extractor.py
gui.py		gui.py
make_icon.py		make_icon.py
material_downloader.py		material_downloader.py
note_generation.py		note_generation.py
pipeline_worker.py		pipeline_worker.py
requirements.txt		requirements.txt
run.py		run.py
semantic_alignment.py		semantic_alignment.py
video_downloader.py		video_downloader.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoNote

Quick start

Interface overview

First-time setup

Setup wizard (automatic)

Connection settings

API keys

Running the full pipeline

Pipeline steps

Pipeline options

Screen recording detection

Video↔slide matching models

Dashboard

Generate Notes page

Options

Output structure

Supported LLM models

Incremental updates

File layout

Windows installation

Troubleshooting

About

Uh oh!

Releases 39

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AutoNote

Quick start

Interface overview

First-time setup

Setup wizard (automatic)

Connection settings

API keys

Running the full pipeline

Pipeline steps

Pipeline options

Screen recording detection

Video↔slide matching models

Dashboard

Generate Notes page

Options

Output structure

Supported LLM models

Incremental updates

File layout

Windows installation

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 39

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages