Releases: transcriptintel/transcribeit
Releases · transcriptintel/transcribeit
v1.2.1
Fixes
- Portable binary distribution — Binary uses relative rpath (
@executable_path/libon macOS,$ORIGIN/libon Linux) instead of hardcoded build paths. Deploy by copying the binary +lib/directory with sherpa-onnx dylibs.
Documentation
- Binary distribution guide in README
- Dylib troubleshooting for
Library not loadederrors
Distribution layout
transcribeit
lib/
libsherpa-onnx-c-api.dylib
libonnxruntime.dylib
Run transcribeit setup on the target machine to download models.
v1.2.0
v1.2.0 — Speaker diarization, VAD segmentation, self-bootstrapping
New features
- Speaker diarization —
--speakers Nwith pyannote segmentation + speaker embedding models. Labels speakers in VTT (<v Speaker 0>), SRT ([Speaker 0]), and manifest JSON. - VAD-based segmentation — Silero VAD detects speech boundaries with padding and gap merging. Avoids mid-word cuts that FFmpeg
silencedetectcan produce. Use--vad-model. transcribeit setup— Self-bootstrapping command that downloads all components (STT models, VAD, diarization models, sherpa-onnx shared libraries) with platform auto-detection.- Auto-detect model architectures — sherpa-onnx engine detects Whisper, Moonshine, and SenseVoice from model files.
- BSL 1.1 license — Free for non-commercial/evaluation use.
Improvements
- Dependency upgrades: whisper-rs 0.16, reqwest 0.13, indicatif 0.18, bzip2 0.6
- Fixed whisper-rs bug causing empty transcripts without
--language - sherpa-onnx is optional (
cargo build --no-default-features) - Suppressed sherpa-onnx C++ stderr warnings
- Deduped retry/fallback loops in API engines
- Static regex compilation for rate limit parsing
- Negative timestamp guard in VTT/SRT formatters
- Default output format changed to VTT,
-fshort flag added download-model --vadand--diarizeflags
Quick start
# Bootstrap everything
transcribeit setup
# Transcribe (local whisper.cpp, VTT output by default)
transcribeit run -i recording.mp3 -m base -o ./output
# With speaker diarization
transcribeit run -i interview.mp3 -m large-v3-turbo --speakers 2 \
--diarize-segmentation-model .cache/sherpa-onnx-pyannote-segmentation-3-0/model.onnx \
--diarize-embedding-model .cache/wespeaker_en_voxceleb_CAM++.onnx -o ./output
# With VAD segmentation (sherpa-onnx)
transcribeit run -p sherpa-onnx -m base -i recording.mp3 \
--vad-model .cache/silero_vad.onnx -o ./outputPerformance (Apple Silicon, release build)
| Audio length | Model | Time | Realtime factor |
|---|---|---|---|
| 5 min | base | 3.6s | 83x |
| 10 min | large-v3-turbo | 69.9s | 8.6x |
| 31 min | large-v3-turbo | 254s | 7.5x |
Build
cargo build --release # with sherpa-onnx
cargo build --release --no-default-features # without sherpa-onnxv1.1.0
Multi-provider speech-to-text CLI
Providers
- Local — whisper.cpp via whisper-rs (GGML models)
- Sherpa-ONNX — sherpa-onnx with Whisper ONNX models, dedicated worker thread, auto-segmentation at ≤30s
- OpenAI — OpenAI-compatible API (configurable base URL for LocalAI, vLLM, Qwen AST, etc.)
- Azure OpenAI — Azure deployment-based auth with verbose_json caching
Features
- Any audio/video input format — FFmpeg auto-converts to mono 16kHz
- Silence-based segmentation with auto-split for API size limits (25MB)
- VTT (default), SRT, and text output formats (
-f vtt|srt|text) - JSON manifest with processing metadata and statistics
- Model download and management for both GGML and ONNX formats
- Rate limiting with retry (parses Retry-After header and error body)
- Configurable timeouts, retries, and segment concurrency
- Batch processing — directory or glob input
- Language hinting (
--language) and audio normalization (--normalize) - Model alias resolution (
-m baseinstead of-m .cache/ggml-base.bin) - Progress spinner during transcription
- In-memory model caching for batch/segmented processing
- sherpa-onnx as optional feature flag (
--no-default-featuresto exclude)
Quick start
# Download a model
transcribeit download-model -s base
# Transcribe
transcribeit run -i recording.mp3 -m base
# With output directory
transcribeit run -i meeting.mp4 -m base -o ./output
# OpenAI API
transcribeit run -p openai -i recording.mp3
# Azure
transcribeit run -p azure -i recording.mp3
# sherpa-onnx
transcribeit download-model -s tiny -f onnx
transcribeit run -p sherpa-onnx -m tiny -i recording.mp3Build
Requires Rust 1.80+, FFmpeg, CMake, and a C/C++ toolchain.
cargo build --releaseFor sherpa-onnx support, set SHERPA_ONNX_LIB_DIR in .env pointing to the sherpa-onnx shared libraries.
To build without sherpa-onnx: cargo build --release --no-default-features