Skip to content

Releases: transcriptintel/transcribeit

v1.2.1

16 Mar 13:34
5c667e4

Choose a tag to compare

Fixes

  • Portable binary distribution — Binary uses relative rpath (@executable_path/lib on macOS, $ORIGIN/lib on Linux) instead of hardcoded build paths. Deploy by copying the binary + lib/ directory with sherpa-onnx dylibs.

Documentation

  • Binary distribution guide in README
  • Dylib troubleshooting for Library not loaded errors

Distribution layout

transcribeit
lib/
  libsherpa-onnx-c-api.dylib
  libonnxruntime.dylib

Run transcribeit setup on the target machine to download models.

v1.2.0

16 Mar 12:30
c6d60d9

Choose a tag to compare

v1.2.0 — Speaker diarization, VAD segmentation, self-bootstrapping

New features

  • Speaker diarization--speakers N with pyannote segmentation + speaker embedding models. Labels speakers in VTT (<v Speaker 0>), SRT ([Speaker 0]), and manifest JSON.
  • VAD-based segmentation — Silero VAD detects speech boundaries with padding and gap merging. Avoids mid-word cuts that FFmpeg silencedetect can produce. Use --vad-model.
  • transcribeit setup — Self-bootstrapping command that downloads all components (STT models, VAD, diarization models, sherpa-onnx shared libraries) with platform auto-detection.
  • Auto-detect model architectures — sherpa-onnx engine detects Whisper, Moonshine, and SenseVoice from model files.
  • BSL 1.1 license — Free for non-commercial/evaluation use.

Improvements

  • Dependency upgrades: whisper-rs 0.16, reqwest 0.13, indicatif 0.18, bzip2 0.6
  • Fixed whisper-rs bug causing empty transcripts without --language
  • sherpa-onnx is optional (cargo build --no-default-features)
  • Suppressed sherpa-onnx C++ stderr warnings
  • Deduped retry/fallback loops in API engines
  • Static regex compilation for rate limit parsing
  • Negative timestamp guard in VTT/SRT formatters
  • Default output format changed to VTT, -f short flag added
  • download-model --vad and --diarize flags

Quick start

# Bootstrap everything
transcribeit setup

# Transcribe (local whisper.cpp, VTT output by default)
transcribeit run -i recording.mp3 -m base -o ./output

# With speaker diarization
transcribeit run -i interview.mp3 -m large-v3-turbo --speakers 2 \
  --diarize-segmentation-model .cache/sherpa-onnx-pyannote-segmentation-3-0/model.onnx \
  --diarize-embedding-model .cache/wespeaker_en_voxceleb_CAM++.onnx -o ./output

# With VAD segmentation (sherpa-onnx)
transcribeit run -p sherpa-onnx -m base -i recording.mp3 \
  --vad-model .cache/silero_vad.onnx -o ./output

Performance (Apple Silicon, release build)

Audio length Model Time Realtime factor
5 min base 3.6s 83x
10 min large-v3-turbo 69.9s 8.6x
31 min large-v3-turbo 254s 7.5x

Build

cargo build --release                    # with sherpa-onnx
cargo build --release --no-default-features  # without sherpa-onnx

v1.1.0

14 Mar 15:46

Choose a tag to compare

Multi-provider speech-to-text CLI

Providers

  • Local — whisper.cpp via whisper-rs (GGML models)
  • Sherpa-ONNX — sherpa-onnx with Whisper ONNX models, dedicated worker thread, auto-segmentation at ≤30s
  • OpenAI — OpenAI-compatible API (configurable base URL for LocalAI, vLLM, Qwen AST, etc.)
  • Azure OpenAI — Azure deployment-based auth with verbose_json caching

Features

  • Any audio/video input format — FFmpeg auto-converts to mono 16kHz
  • Silence-based segmentation with auto-split for API size limits (25MB)
  • VTT (default), SRT, and text output formats (-f vtt|srt|text)
  • JSON manifest with processing metadata and statistics
  • Model download and management for both GGML and ONNX formats
  • Rate limiting with retry (parses Retry-After header and error body)
  • Configurable timeouts, retries, and segment concurrency
  • Batch processing — directory or glob input
  • Language hinting (--language) and audio normalization (--normalize)
  • Model alias resolution (-m base instead of -m .cache/ggml-base.bin)
  • Progress spinner during transcription
  • In-memory model caching for batch/segmented processing
  • sherpa-onnx as optional feature flag (--no-default-features to exclude)

Quick start

# Download a model
transcribeit download-model -s base

# Transcribe
transcribeit run -i recording.mp3 -m base

# With output directory
transcribeit run -i meeting.mp4 -m base -o ./output

# OpenAI API
transcribeit run -p openai -i recording.mp3

# Azure
transcribeit run -p azure -i recording.mp3

# sherpa-onnx
transcribeit download-model -s tiny -f onnx
transcribeit run -p sherpa-onnx -m tiny -i recording.mp3

Build

Requires Rust 1.80+, FFmpeg, CMake, and a C/C++ toolchain.

cargo build --release

For sherpa-onnx support, set SHERPA_ONNX_LIB_DIR in .env pointing to the sherpa-onnx shared libraries.
To build without sherpa-onnx: cargo build --release --no-default-features