Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions ai_agents/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,9 @@ DEEPSEEK_API_KEY=
STEPFUN_API_KEY=
GLADIA_API_KEY=

# SiliconFlow unified API key (LLM / TTS)
SILICONFLOW_API_KEY=

# Extension: bedrock_llm
# Extension: polly_tts
AWS_ACCESS_KEY_ID=
Expand Down
2 changes: 1 addition & 1 deletion ai_agents/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -509,7 +509,7 @@ Required `.env` variables depend on extensions used. Common ones:
- `DEEPGRAM_API_KEY`, `AZURE_ASR_API_KEY`, `AZURE_ASR_REGION`

**TTS:**
- `ELEVENLABS_TTS_KEY`, `AZURE_TTS_KEY`, `AZURE_TTS_REGION`
- `ELEVENLABS_TTS_KEY`, `AZURE_TTS_KEY`, `AZURE_TTS_REGION`, `SILICONFLOW_API_KEY`

See `.env.example` for complete list.

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# siliconflow_tts2_python/
> L2 | 父级: /mnt/e/wsf-project/ai_agents/AGENTS.md

成员清单
`__init__.py`: 包声明,保持 Python 扩展目录可导入。
`addon.py`: TEN addon 注册入口,暴露 `siliconflow_tts2_python` 扩展实例。
`config.py`: SiliconFlow TTS 配置模型,归一化默认参数并校验采样率与响应格式。
`extension.py`: HTTP TTS 基座适配层,负责创建配置/客户端并暴露采样率。
`siliconflow_tts.py`: 供应商 HTTP 客户端,请求 `/audio/speech`,嗅探真实响应格式,并把 MPEG 解码成 PCM 数据块。
`wav_stream_parser.py`: 流式 WAV 头解析器,仅在响应真实为 RIFF/WAV 时拆出 PCM 数据。
`manifest.json`: 扩展元数据与属性模式,供 tman 和 TEN 运行时读取。
`property.json`: 默认属性模板,约定 SiliconFlow 的环境变量和默认音色。
`requirements.txt`: Python 依赖声明,包含 `httpx` 与 `miniaudio`。
`README.md`: 扩展说明与最小配置示例。

法则: 成员完整·一行一文件·父级链接·技术词前置

[PROTOCOL]: 变更时更新此头部,然后检查 AGENT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# siliconflow_tts2_python/
> L2 | 父级: /mnt/e/wsf-project/ai_agents/CLAUDE.md

成员清单
`__init__.py`: 包声明,保持 Python 扩展目录可导入。
`addon.py`: TEN addon 注册入口,暴露 `siliconflow_tts2_python` 扩展实例。
`config.py`: SiliconFlow TTS 配置模型,归一化默认参数并校验采样率与响应格式。
`extension.py`: HTTP TTS 基座适配层,负责创建配置/客户端并暴露采样率。
`siliconflow_tts.py`: 供应商 HTTP 客户端,请求 `/audio/speech`,嗅探真实响应格式,并把 MPEG 解码成 PCM 数据块。
`wav_stream_parser.py`: 流式 WAV 头解析器,仅在响应真实为 RIFF/WAV 时拆出 PCM 数据。
`manifest.json`: 扩展元数据与属性模式,供 tman 和 TEN 运行时读取。
`property.json`: 默认属性模板,约定 SiliconFlow 的环境变量和默认音色。
`requirements.txt`: Python 依赖声明,包含 `httpx` 与 `miniaudio`。
`README.md`: 扩展说明与最小配置示例。

法则: 成员完整·一行一文件·父级链接·技术词前置

[PROTOCOL]: 变更时更新此头部,然后检查 AGENT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# siliconflow_tts2_python

SiliconFlow TTS extension built on TEN's `AsyncTTS2HttpExtension`.

## Notes

- Uses `POST /v1/audio/speech`
- Defaults to `response_format: "mp3"` because SiliconFlow currently returns `audio/mpeg`
- The extension decodes returned MP3 into mono 16-bit PCM before handing audio to TEN

## Required Params

```json
{
"params": {
"api_key": "${env:SILICONFLOW_API_KEY}",
"base_url": "https://api.siliconflow.cn/v1",
"model": "IndexTeam/IndexTTS-2",
"voice": "IndexTeam/IndexTTS-2:anna"
}
}
```

## Optional Params

- `sample_rate`
- `speed`
- `gain`
- `max_tokens`
- `response_format` (`mp3`, `wav` or `pcm`)
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#
# This file is part of TEN Framework, an open source project.
# Licensed under the Apache License, Version 2.0.
# See the LICENSE file for more information.
#
"""
/* [INPUT]: 依赖 addon.py 的注册副作用
* [OUTPUT]: 包导入时自动触发 siliconflow_tts2_python 的 addon 注册
* [POS]: siliconflow_tts2_python 包入口,适配 Python addon loader 的导入约定
* [PROTOCOL]: 变更时更新此头部,然后检查 AGENT.md
*/
"""

from . import addon
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#
# This file is part of TEN Framework, an open source project.
# Licensed under the Apache License, Version 2.0.
# See the LICENSE file for more information.
#
"""
/* [INPUT]: 依赖 ten_runtime 的 Addon/注册器,依赖 extension.py 的 SiliconFlowTTSExtension
* [OUTPUT]: 对外提供 siliconflow_tts2_python 扩展注册入口
* [POS]: siliconflow_tts2_python 模块的 TEN 入口,被运行时按 addon 名称实例化
* [PROTOCOL]: 变更时更新此头部,然后检查 AGENT.md
*/
"""

from ten_runtime import Addon, TenEnv, register_addon_as_extension

from .extension import SiliconFlowTTSExtension


@register_addon_as_extension("siliconflow_tts2_python")
class SiliconFlowTTSExtensionAddon(Addon):
def on_create_instance(self, ten_env: TenEnv, name: str, context) -> None:
ten_env.log_info("SiliconFlowTTSExtensionAddon on_create_instance")
ten_env.on_create_instance_done(
SiliconFlowTTSExtension(name), context
)

Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
#
# This file is part of TEN Framework, an open source project.
# Licensed under the Apache License, Version 2.0.
# See the LICENSE file for more information.
#
"""
/* [INPUT]: 依赖 pydantic 的 Field,依赖 ten_ai_base.tts2_http 的 AsyncTTS2HttpConfig
* [OUTPUT]: 对外提供 SiliconFlowTTSConfig 配置模型和参数校验能力
* [POS]: siliconflow_tts2_python 的配置归一化层,给 extension/client 提供单一真相源
* [PROTOCOL]: 变更时更新此头部,然后检查 AGENT.md
*/
"""

from pathlib import Path
from typing import Any
import copy

from pydantic import Field

from ten_ai_base import utils
from ten_ai_base.tts2_http import AsyncTTS2HttpConfig


SUPPORTED_RESPONSE_FORMATS = {"wav", "pcm", "mp3"}


class SiliconFlowTTSConfig(AsyncTTS2HttpConfig):
dump: bool = Field(default=False, description="SiliconFlow TTS dump")
dump_path: str = Field(
default_factory=lambda: str(
Path(__file__).parent / "siliconflow_tts_in.pcm"
),
description="SiliconFlow TTS dump path",
)
params: dict[str, Any] = Field(
default_factory=dict, description="SiliconFlow TTS params"
)
sample_rate: int = Field(default=32000, description="PCM sample rate")

def update_params(self) -> None:
self.params.pop("input", None)
self.params["stream"] = True
self.params.setdefault("base_url", "https://api.siliconflow.cn/v1")
self.params.setdefault("model", "IndexTeam/IndexTTS-2")
self.params.setdefault("voice", "IndexTeam/IndexTTS-2:anna")
self.params.setdefault("max_tokens", 2048)
self.params.setdefault("speed", 1)
self.params.setdefault("gain", 0)
self.params.setdefault("response_format", "mp3")

if "sample_rate" in self.params:
self.sample_rate = int(self.params["sample_rate"])
else:
self.params["sample_rate"] = self.sample_rate

def to_str(self, sensitive_handling: bool = True) -> str:
if not sensitive_handling:
return f"{self}"

config = copy.deepcopy(self)
if config.params and "api_key" in config.params:
config.params["api_key"] = utils.encrypt(config.params["api_key"])
return f"{config}"

def validate(self) -> None:
if "api_key" not in self.params or not self.params["api_key"]:
raise ValueError("API key is required for SiliconFlow TTS")
if "model" not in self.params or not self.params["model"]:
raise ValueError("Model is required for SiliconFlow TTS")
if "voice" not in self.params or not self.params["voice"]:
raise ValueError("Voice is required for SiliconFlow TTS")

response_format = str(self.params.get("response_format", "wav")).lower()
if response_format not in SUPPORTED_RESPONSE_FORMATS:
raise ValueError(
"SiliconFlow TTS in TEN only supports 'wav', 'pcm' or 'mp3' "
"response_format"
)

if self.sample_rate <= 0:
raise ValueError("sample_rate must be a positive integer")
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#
# This file is part of TEN Framework, an open source project.
# Licensed under the Apache License, Version 2.0.
# See the LICENSE file for more information.
#
"""
/* [INPUT]: 依赖 ten_ai_base.tts2_http 的 HTTP TTS 基座,依赖 config.py 和 siliconflow_tts.py
* [OUTPUT]: 对外提供 SiliconFlowTTSExtension 扩展类
* [POS]: siliconflow_tts2_python 的运行时适配层,负责把 TEN 生命周期接到 SiliconFlow 客户端
* [PROTOCOL]: 变更时更新此头部,然后检查 AGENT.md
*/
"""

from ten_ai_base.tts2_http import (
AsyncTTS2HttpClient,
AsyncTTS2HttpConfig,
AsyncTTS2HttpExtension,
)
from ten_runtime import AsyncTenEnv

from .config import SiliconFlowTTSConfig
from .siliconflow_tts import SiliconFlowTTSClient


class SiliconFlowTTSExtension(AsyncTTS2HttpExtension):
def __init__(self, name: str) -> None:
super().__init__(name)
self.config: SiliconFlowTTSConfig | None = None
self.client: SiliconFlowTTSClient | None = None

async def create_config(self, config_json_str: str) -> AsyncTTS2HttpConfig:
return SiliconFlowTTSConfig.model_validate_json(config_json_str)

async def create_client(
self, config: AsyncTTS2HttpConfig, ten_env: AsyncTenEnv
) -> AsyncTTS2HttpClient:
return SiliconFlowTTSClient(config=config, ten_env=ten_env)

def vendor(self) -> str:
return "siliconflow"

def synthesize_audio_sample_rate(self) -> int:
if self.config is None:
return 32000
return self.config.sample_rate

Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
{
"type": "extension",
"name": "siliconflow_tts2_python",
"version": "0.1.0",
"dependencies": [
{
"type": "system",
"name": "ten_runtime_python",
"version": "0.11"
},
{
"type": "system",
"name": "ten_ai_base",
"version": "0.7"
}
],
"package": {
"include": [
"manifest.json",
"property.json",
"**.tent",
"**.py",
"README.md",
"requirements.txt",
"AGENT.md",
"CLAUDE.md"
]
},
"api": {
"interface": [
{
"import_uri": "../../system/ten_ai_base/api/tts-interface.json"
}
],
"property": {
"properties": {
"params": {
"type": "object",
"properties": {
"api_key": {
"type": "string"
},
"base_url": {
"type": "string"
},
"model": {
"type": "string"
},
"voice": {
"type": "string"
},
"sample_rate": {
"type": "int64"
},
"speed": {
"type": "float64"
},
"gain": {
"type": "float64"
},
"max_tokens": {
"type": "int64"
},
"response_format": {
"type": "string"
},
"stream": {
"type": "bool"
}
}
},
"dump": {
"type": "bool"
},
"dump_path": {
"type": "string"
}
}
}
}
}

Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"dump": false,
"dump_path": "./",
"params": {
"api_key": "${env:SILICONFLOW_API_KEY|}",
"base_url": "https://api.siliconflow.cn/v1",
"model": "IndexTeam/IndexTTS-2",
"voice": "IndexTeam/IndexTTS-2:anna",
"sample_rate": 32000,
"speed": 1,
"gain": 0,
"max_tokens": 2048,
"response_format": "mp3",
"stream": true
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
httpx
miniaudio
Loading
Loading