Releases · transcriptintel/transcribeit

16 Mar 13:34

skitsanos

v1.2.1

5c667e4

v1.2.1 Latest

Latest

Fixes

Portable binary distribution — Binary uses relative rpath (@executable_path/lib on macOS, $ORIGIN/lib on Linux) instead of hardcoded build paths. Deploy by copying the binary + lib/ directory with sherpa-onnx dylibs.

Documentation

Binary distribution guide in README
Dylib troubleshooting for Library not loaded errors

Distribution layout

transcribeit
lib/
  libsherpa-onnx-c-api.dylib
  libonnxruntime.dylib

Run transcribeit setup on the target machine to download models.

Assets 2

16 Mar 12:30

skitsanos

v1.2.0

c6d60d9

v1.2.0

v1.2.0 — Speaker diarization, VAD segmentation, self-bootstrapping

New features

Speaker diarization — --speakers N with pyannote segmentation + speaker embedding models. Labels speakers in VTT (<v Speaker 0>), SRT ([Speaker 0]), and manifest JSON.
VAD-based segmentation — Silero VAD detects speech boundaries with padding and gap merging. Avoids mid-word cuts that FFmpeg silencedetect can produce. Use --vad-model.
transcribeit setup — Self-bootstrapping command that downloads all components (STT models, VAD, diarization models, sherpa-onnx shared libraries) with platform auto-detection.
Auto-detect model architectures — sherpa-onnx engine detects Whisper, Moonshine, and SenseVoice from model files.
BSL 1.1 license — Free for non-commercial/evaluation use.

Improvements

Dependency upgrades: whisper-rs 0.16, reqwest 0.13, indicatif 0.18, bzip2 0.6
Fixed whisper-rs bug causing empty transcripts without --language
sherpa-onnx is optional (cargo build --no-default-features)
Suppressed sherpa-onnx C++ stderr warnings
Deduped retry/fallback loops in API engines
Static regex compilation for rate limit parsing
Negative timestamp guard in VTT/SRT formatters
Default output format changed to VTT, -f short flag added
download-model --vad and --diarize flags

Quick start

# Bootstrap everything
transcribeit setup

# Transcribe (local whisper.cpp, VTT output by default)
transcribeit run -i recording.mp3 -m base -o ./output

# With speaker diarization
transcribeit run -i interview.mp3 -m large-v3-turbo --speakers 2 \
  --diarize-segmentation-model .cache/sherpa-onnx-pyannote-segmentation-3-0/model.onnx \
  --diarize-embedding-model .cache/wespeaker_en_voxceleb_CAM++.onnx -o ./output

# With VAD segmentation (sherpa-onnx)
transcribeit run -p sherpa-onnx -m base -i recording.mp3 \
  --vad-model .cache/silero_vad.onnx -o ./output

Performance (Apple Silicon, release build)

Audio length	Model	Time	Realtime factor
5 min	base	3.6s	83x
10 min	large-v3-turbo	69.9s	8.6x
31 min	large-v3-turbo	254s	7.5x

Build

cargo build --release                    # with sherpa-onnx
cargo build --release --no-default-features  # without sherpa-onnx

Assets 2

14 Mar 15:46

skitsanos

v1.1.0

233479f

v1.1.0

Multi-provider speech-to-text CLI

Providers

Local — whisper.cpp via whisper-rs (GGML models)
Sherpa-ONNX — sherpa-onnx with Whisper ONNX models, dedicated worker thread, auto-segmentation at ≤30s
OpenAI — OpenAI-compatible API (configurable base URL for LocalAI, vLLM, Qwen AST, etc.)
Azure OpenAI — Azure deployment-based auth with verbose_json caching

Features

Any audio/video input format — FFmpeg auto-converts to mono 16kHz
Silence-based segmentation with auto-split for API size limits (25MB)
VTT (default), SRT, and text output formats (-f vtt|srt|text)
JSON manifest with processing metadata and statistics
Model download and management for both GGML and ONNX formats
Rate limiting with retry (parses Retry-After header and error body)
Configurable timeouts, retries, and segment concurrency
Batch processing — directory or glob input
Language hinting (--language) and audio normalization (--normalize)
Model alias resolution (-m base instead of -m .cache/ggml-base.bin)
Progress spinner during transcription
In-memory model caching for batch/segmented processing
sherpa-onnx as optional feature flag (--no-default-features to exclude)

Quick start

# Download a model
transcribeit download-model -s base

# Transcribe
transcribeit run -i recording.mp3 -m base

# With output directory
transcribeit run -i meeting.mp4 -m base -o ./output

# OpenAI API
transcribeit run -p openai -i recording.mp3

# Azure
transcribeit run -p azure -i recording.mp3

# sherpa-onnx
transcribeit download-model -s tiny -f onnx
transcribeit run -p sherpa-onnx -m tiny -i recording.mp3

Build

Requires Rust 1.80+, FFmpeg, CMake, and a C/C++ toolchain.

cargo build --release

For sherpa-onnx support, set SHERPA_ONNX_LIB_DIR in .env pointing to the sherpa-onnx shared libraries.
To build without sherpa-onnx: cargo build --release --no-default-features

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Fixes

Documentation

Distribution layout

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

v1.2.0 — Speaker diarization, VAD segmentation, self-bootstrapping

New features

Improvements

Quick start

Performance (Apple Silicon, release build)

Build

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Multi-provider speech-to-text CLI

Providers

Features

Quick start

Build

Uh oh!

Releases: transcriptintel/transcribeit

v1.2.1

Fixes

Documentation

Distribution layout

Uh oh!

v1.2.0

v1.2.0 — Speaker diarization, VAD segmentation, self-bootstrapping

New features

Improvements

Quick start

Performance (Apple Silicon, release build)

Build

Uh oh!

v1.1.0

Multi-provider speech-to-text CLI

Providers

Features

Quick start

Build

Uh oh!