feat(vscode): multi-provider speech synthesis for AI responses by Ghenghis · Pull Request #8839 · Kilo-Org/kilocode

Ghenghis · 2026-04-13T10:21:04Z

Summary

Adds a Speech tab to Settings with 6 text-to-speech providers, all with free tiers:

Browser (default) — Web Speech API, offline, no setup
Azure Cognitive Services — 500K chars/month free, SSML, 125+ voices
Google Cloud TTS — 4M chars/month free, Neural2 + Studio voices
OpenAI TTS — $5 free credit, 10 voices
ElevenLabs — 10K chars/month free, expressive voices
Amazon Polly — 5M chars/month free (12 months), SSML

Architecture

SpeechProvider interface + SpeechProviderRegistry (matches upstream provider pattern)
Provider-agnostic playback with LRU cache (32 entries)
25-rule text filter with sentiment detection
Per-provider capabilities gating (SSML, styles, emphasis, pronunciations)
Auto-speak, interrupt-on-type, voice favorites & presets

Key files

webview-ui/src/types/voice.ts — Core type definitions
webview-ui/src/data/speech-providers.ts — Registry
webview-ui/src/utils/speech-providers/ — 6 provider implementations
webview-ui/src/utils/speech-playback.ts — Unified playback engine
webview-ui/src/components/settings/SpeechTab.tsx — Settings UI

Test plan

95 unit tests passing (bun:test): registry, browser-provider, azure-provider, text-filter
ESLint: 0 errors across 14 speech files
esbuild: 5 bundles, 0 errors
VSIX built and installed in VS Code
Manual: enable speech, test Browser provider (no API key needed)
Manual: test Azure/Google/OpenAI/ElevenLabs/Polly with free-tier keys

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

sync to main

kilo-code-bot · 2026-04-13T10:25:33Z

packages/kilo-vscode/webview-ui/src/App.tsx

+  const [speechSettings, setSpeechSettings] = createSignal<SpeechSettings | null>(null)
+  let lastSpokenMessageId = ""
+
+  onMount(() => {


WARNING: Speech settings never refresh after the initial load

AppContent requests speechSettingsLoaded once on mount, but SpeechTab only sends updateSetting messages and the extension never pushes a refreshed settings payload back. In practice, toggling enabled, autoSpeak, or interruptOnType in the current webview will not change auto-speak behavior until the webview is reloaded.

kilo-code-bot · 2026-04-13T10:25:33Z

packages/kilo-vscode/webview-ui/src/App.tsx

+          region: ss.azure.region,
+          apiKey: ss.azure.apiKey,
+          voiceId: ss.azure.voiceId,
+          pitch: ss.tuning.pitch + sentiment.pitchModifier,


WARNING: sentimentIntensity has no effect on synthesis

The new slider is persisted in settings, but the auto-speak path always applies the full detectSentiment() modifiers here. Changing kilo-code.new.speech.sentimentIntensity never scales these deltas, so the user-facing control does nothing.

kilo-code-bot · 2026-04-13T10:25:33Z

packages/kilo-vscode/webview-ui/src/utils/speech-playback.ts

+	ensureAudioReady()
+
+	_abortController = new AbortController()
+	const cacheKey = SynthesisCache.hash(text, opts.voiceId, opts.style ?? "default", opts.pitch ?? 0, opts.rate ?? 1.0)


WARNING: Cache key omits several tuning inputs

The synthesis cache only keys on text, voice, style, pitch, and rate. Changing styleDegree, emphasis, pronunciations, or audioFormat can still reuse stale audio from a previous request, so preview and auto-speak will not reliably reflect the current settings.

kilo-code-bot · 2026-04-13T10:25:33Z

packages/kilo-vscode/webview-ui/src/utils/speech-text-filter.ts

+
+  // 6. Remove diff hunks (@@ ... @@, +/- lines)
+  result = result.replace(/^@@\s.*@@.*$/gm, "")
+  result = result.replace(/^[+-]{1,3}\s.*$/gm, "")


WARNING: This strips normal markdown bullet lists, not just diff hunks

/^[+-]{1,3}\s.*$/gm matches ordinary - item and + item list lines. Because assistant responses in this UI are commonly formatted as bullet lists, auto-speak will drop large chunks of normal prose before it ever reaches Azure TTS.

kilo-code-bot · 2026-04-13T10:26:00Z

Code Review Summary

Status: 10 Issues Found | Recommendation: Address before merge

Overview

Severity	Count
CRITICAL	0
WARNING	9
SUGGESTION	1

Fix these issues in Kilo Cloud

Issue Details (click to expand)

WARNING

File	Line	Issue
`packages/kilo-vscode/webview-ui/src/App.tsx`	235	Speech settings are only loaded once, so changing speech toggles in the same webview does not affect auto-speak until reload.
`packages/kilo-vscode/webview-ui/src/App.tsx`	287	`sentimentIntensity` is persisted but never applied when computing pitch/rate modifiers.
`packages/kilo-vscode/webview-ui/src/utils/speech-playback.ts`	29	The synthesis cache key omits tuning fields like `styleDegree`, `emphasis`, `pronunciations`, and `audioFormat`, which can replay stale audio.
`packages/kilo-vscode/webview-ui/src/utils/speech-text-filter.ts`	56	The diff-line regex also matches normal markdown bullets, causing valid assistant prose to be dropped before synthesis.
`packages/kilo-vscode/webview-ui/src/components/settings/SpeechTab.tsx`	364	Switching away from Azure still falls back to `speech.azure.voiceId`, so previews use an invalid voice id until the user manually reselects one.
`packages/kilo-vscode/webview-ui/src/components/settings/SpeechTab.tsx`	1065	The audio format select always uses Azure-specific values instead of the active provider’s advertised formats.
`packages/kilo-vscode/webview-ui/src/utils/speech-providers/polly-provider.ts`	99	Polly requests use an `X-Api-Key` header instead of AWS SigV4 signing, so synthesis calls will be rejected.
`packages/app/e2e/fixtures.ts`	156	The new `seedModel` override is never used because localStorage still hard-codes `kilo/mistralai/codestral-2508`, so env-configured e2e runs can seed the wrong model.
`script/changelog.ts`	51	Changelog generation now shells out to a global `kilo` binary, which makes `bun script/changelog.ts` fail in a fresh checkout that only has repo dependencies installed.

SUGGESTION

File	Line	Issue
`README.md`	69	For markdown documentation, use markdown image syntax like `![Image Name](./path.png)` instead of HTML `<img>` tags.

Other Observations (not in diff)

Issues found in unchanged code that cannot receive inline comments:

None.

Files Reviewed (27 files)

docs/plans/2026-04-13-multi-provider-speech-design.md - 0 issues
docs/plans/2026-04-13-multi-provider-speech-implementation.md - 0 issues
packages/kilo-vscode/eslint.config.mjs - 0 issues
packages/kilo-vscode/package.json - 0 issues
packages/kilo-vscode/src/KiloProvider.ts - 0 issues
packages/kilo-vscode/src/webview-html-utils.ts - 0 issues
packages/kilo-vscode/tests/unit/azure-provider.test.ts - 0 issues
packages/kilo-vscode/tests/unit/browser-provider.test.ts - 0 issues
packages/kilo-vscode/tests/unit/speech-provider-registry.test.ts - 0 issues
packages/kilo-vscode/tests/unit/speech-text-filter.test.ts - 0 issues
packages/kilo-vscode/webview-ui/src/App.tsx - 2 issues
packages/kilo-vscode/webview-ui/src/components/settings/SpeechTab.tsx - 2 issues
packages/kilo-vscode/webview-ui/src/data/speech-providers.ts - 0 issues
packages/kilo-vscode/webview-ui/src/types/voice.ts - 0 issues
packages/kilo-vscode/webview-ui/src/utils/speech-playback.ts - 1 issue
packages/kilo-vscode/webview-ui/src/utils/speech-providers/azure-provider.ts - 0 issues
packages/kilo-vscode/webview-ui/src/utils/speech-providers/browser-provider.ts - 0 issues
packages/kilo-vscode/webview-ui/src/utils/speech-providers/elevenlabs-provider.ts - 0 issues
packages/kilo-vscode/webview-ui/src/utils/speech-providers/google-provider.ts - 0 issues
packages/kilo-vscode/webview-ui/src/utils/speech-providers/openai-provider.ts - 0 issues
packages/kilo-vscode/webview-ui/src/utils/speech-providers/polly-provider.ts - 1 issue
packages/kilo-vscode/src/agent-manager/run/service.ts - 0 issues
packages/opencode/src/cli/cmd/tui/plugin/runtime.ts - 0 issues
script/version.ts - 0 issues
script/changelog.ts - 1 issue
packages/app/e2e/fixtures.ts - 1 issue
README.md - 1 issue

_{Reviewed by gpt-5.4-20260305 · 4,072,210 tokens}

kilo-code-bot · 2026-04-13T15:53:11Z

packages/kilo-vscode/webview-ui/src/components/settings/SpeechTab.tsx

+			await speak(previewText(), p, {
+				region: getRegion() || undefined,
+				apiKey: getApiKey(),
+				voiceId: voiceId ?? s.azure.voiceId,


WARNING: Non-Azure providers still fall back to an Azure voice ID

When the user switches providers, handleProviderChange() clears the selected voice but leaves the persisted fallback in speech.azure.voiceId. Preview then sends en-GB-MaisieNeural (or another Azure-specific id) to Google/OpenAI/ElevenLabs/Polly until the user manually picks a voice, and those providers do not recognize that id.

kilo-code-bot · 2026-04-13T15:53:11Z

packages/kilo-vscode/webview-ui/src/components/settings/SpeechTab.tsx

+							description="Higher quality sounds better but uses more bandwidth and API quota"
+						>
+							<Select
+								options={AUDIO_FORMATS}


WARNING: Audio format options are not provider-specific

This select always uses the Azure AUDIO_FORMATS values even when the active provider advertises a different capabilities.audioFormats set. For example, Google expects MP3/OGG_OPUS/LINEAR16, so choosing one of these Azure-only values will generate invalid synth requests.

kilo-code-bot · 2026-04-13T15:53:11Z

packages/kilo-vscode/webview-ui/src/utils/speech-providers/polly-provider.ts

+				method: "POST",
+				headers: {
+					"Content-Type": "application/json",
+					"X-Api-Key": apiKey,


WARNING: Polly requests cannot authenticate with this header

Amazon Polly does not accept a raw access key in X-Api-Key; browser-side calls must be SigV4-signed or proxied through a backend that signs them. As written, every synthesis request here will be rejected, so the Polly provider is effectively nonfunctional.

Extended AzureVoice interface with description and styles fields. Organized with en-GB first (Maisie as default voice). Removed EDGE_TTS references -- Azure-only edition. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- VoicePreset, SpeechSettings, PronunciationEntry interfaces - DEFAULT_SPEECH_SETTINGS with en-GB-MaisieNeural default - Speech message types added to WebviewMessage and ExtensionMessage unions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- tts-azure.ts: Azure REST API synthesis with SSML builder (prosody, styles, emphasis, custom pronunciations) - speech-playback.ts: Web Audio API playback with LRU cache (32 entries), volume control, abort/cancel support Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Section 1: Connection + Global (collapsible) - API key, region, enable/auto-speak toggles, volume, interaction mode, sentiment Section 2: Voice Browser + Favorites - search, locale filter, 125+ voice cards with star/preview, favorites chips bar Section 3: Voice Fine-Tuning (collapsible) - pitch, rate, volume, style chips, emphasis, pauses, pronunciations, presets Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Added Speech tab between Context and Experimental tabs with speech-bubble icon. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- sendSpeechSettings(): reads all speech config from VS Code settings - validateAzureKey(): tests Azure TTS endpoint with a probe synthesis - Wired into init, reset, and message handler paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- 24 speech configuration properties under kilo-code.new.speech.* - Covers connection, global, tuning, favorites, and presets - Default voice: en-GB-MaisieNeural - Updated displayName to "Kilo Code: Azure Voice Edition" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Watches session busy→idle transition to speak last assistant reply - Strips markdown/code blocks/URLs for natural speech - Interrupts playback on keydown when interruptOnType enabled - Stops speech on session switch Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix eqeqeq warnings (== → === for null comparisons) - Compact KiloProvider speech methods to stay within max-lines - Add eslint-disable for complexity in message handler Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Port 25-rule speech-text-filter.ts with 5-layer guardrails from source, update App.tsx to use filterTextForSpeech + detectSentiment instead of inline regex, add Azure TTS endpoint to CSP connect-src, compact switch cases in KiloProvider to stay under max-lines lint rule. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Design for refactoring Azure-only speech into multi-provider architecture with Browser (free/offline) as default and 5 additional providers with free tiers (Azure, Google, OpenAI, ElevenLabs, Amazon Polly). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

15-task plan covering provider interface, 6 providers (Browser, Azure, Google, OpenAI, ElevenLabs, Polly), registry pattern, SpeechTab refactor, CSP/config updates, tests, and PR submission. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Define SpeechVoice, SynthesisOptions, and SpeechProvider interfaces for multi-provider speech architecture. Add SpeechProviderRegistry with register/get/list/listByTier operations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implement BrowserProvider wrapping window.speechSynthesis with guards for non-browser environments. Free, offline, no API key required. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implement AzureProvider that wraps tts-azure.ts and azure-voices.ts, mapping AzureVoice to SpeechVoice with full SSML/style capabilities and testConnection support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…code Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Neural2 and Studio voices across en-US, en-GB, en-AU, en-IN locales with SSML support and 4M chars/month free tier. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

10 voices (alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer) with mp3/opus/aac/flac output and Bearer auth. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

10 voices with actual ElevenLabs voice IDs, xi-api-key auth, and 10K chars/month free tier. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

20 neural voices across en-GB, en-US, en-AU, en-NZ, en-ZA, en-IE, en-IN with SSML/emphasis/pronunciation support. Notes SigV4 needed for production. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace hard-coded Azure TTS with provider-agnostic speak() that accepts a SpeechProvider, delegates synthesis to provider.synthesize(), and handles both Blob results (Web Audio) and void results (Browser). Cache key now includes provider.id. stop() calls provider.stop() in addition to stopping any active AudioBufferSourceNode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add provider dropdown with optgroups (free / free-tier), per-provider config sections (API key, region, test button), and capability-gated tuning controls (styles, emphasis, pronunciations, audio formats). Voice browser now renders voices from the active provider instead of hard-coded Azure list. Extract ProviderConfigSection and ApiKeyRow sub-components to keep complexity manageable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…e-up Expand connect-src CSP to allow Google TTS, OpenAI, ElevenLabs, and Amazon Polly endpoints. Add package.json config keys for provider selection and per-provider API credentials. Update SpeechSettings interface and DEFAULT_SPEECH_SETTINGS with provider field and optional per-provider config blocks. Wire sendSpeechSettings() to read and transmit all new provider settings to the webview. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace hard-coded Azure API key/region checks in the busy-to-idle auto-speak effect with provider-agnostic flow: resolve provider from settings, check requiresApiKey against the correct per-provider key, and pass the provider to speak(). Add getApiKeyForProvider helper to map provider ID to the right credential field. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Showcase the multi-provider speech synthesis feature with 6 screenshots demonstrating provider selection, voice browser, fine-tuning controls, and both free (Browser) and premium (Azure) configurations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge pull request #1 from Kilo-Org/main

d945c43

sync to main

kilo-code-bot bot reviewed Apr 13, 2026

View reviewed changes

Ghenghis changed the title ~~feat: Azure Voice Studio — Speech synthesis for AI responses~~ feat(vscode): multi-provider speech synthesis for AI responses Apr 13, 2026

kilo-code-bot bot reviewed Apr 13, 2026

View reviewed changes

Ghenghis and others added 25 commits April 13, 2026 21:00

Merge branch 'Kilo-Org:main' into main

27ba098

feat: add Azure voice catalog with 125+ English voices

b353b7c

Extended AzureVoice interface with description and styles fields. Organized with en-GB first (Maisie as default voice). Removed EDGE_TTS references -- Azure-only edition. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: wire SpeechTab into Settings tabs

48d6a16

Added Speech tab between Context and Experimental tabs with speech-bubble icon. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: resolve lint errors in speech code

8f28e89

- Fix eqeqeq warnings (== → === for null comparisons) - Compact KiloProvider speech methods to stay within max-lines - Add eslint-disable for complexity in message handler Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(vscode): add browser speech provider using Web Speech API

9469da0

Implement BrowserProvider wrapping window.speechSynthesis with guards for non-browser environments. Free, offline, no API key required. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(vscode): add try/catch to Azure testConnection, remove dead test …

607d7f4

…code Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(vscode): add Google Cloud TTS speech provider

77de4ad

Neural2 and Studio voices across en-US, en-GB, en-AU, en-IN locales with SSML support and 4M chars/month free tier. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(vscode): add OpenAI TTS speech provider

ce0e598

10 voices (alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer) with mp3/opus/aac/flac output and Bearer auth. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(vscode): add ElevenLabs speech provider

d9eca73

10 voices with actual ElevenLabs voice IDs, xi-api-key auth, and 10K chars/month free tier. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(vscode): add Amazon Polly speech provider

3bc27a9

20 neural voices across en-GB, en-US, en-AU, en-NZ, en-ZA, en-IE, en-IN with SSML/emphasis/pronunciation support. Notes SigV4 needed for production. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(vscode): register all 6 speech providers in registry

15e984e

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ghenghis and others added 2 commits April 13, 2026 21:04

test(vscode): add speech text filter and sentiment detection tests

2b07e5c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ghenghis force-pushed the feat/azure-voice-studio branch from d6cfe12 to 2b07e5c Compare April 14, 2026 04:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vscode): multi-provider speech synthesis for AI responses#8839

feat(vscode): multi-provider speech synthesis for AI responses#8839
Ghenghis wants to merge 29 commits intoKilo-Org:mainfrom
AiDave71:feat/azure-voice-studio

Ghenghis commented Apr 13, 2026 •

edited

Loading

Uh oh!

kilo-code-bot bot Apr 13, 2026

Uh oh!

kilo-code-bot bot Apr 13, 2026

Uh oh!

kilo-code-bot bot Apr 13, 2026

Uh oh!

kilo-code-bot bot Apr 13, 2026

Uh oh!

kilo-code-bot bot commented Apr 13, 2026 •

edited

Loading

WARNING

SUGGESTION

Uh oh!

kilo-code-bot bot Apr 13, 2026

Uh oh!

kilo-code-bot bot Apr 13, 2026

Uh oh!

kilo-code-bot bot Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ghenghis commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

Key files

Test plan

Uh oh!

kilo-code-bot bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Overview

WARNING

SUGGESTION

Uh oh!

kilo-code-bot bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Ghenghis commented Apr 13, 2026 •

edited

Loading

kilo-code-bot bot commented Apr 13, 2026 •

edited

Loading