Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,27 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## [Unreleased]

### Added

- ConvoAI quickstart gating regression cases in `tests/eval-cases.md` — working-baseline detection, no `/join` bypass, and quickstart-skip coverage
- ConvoAI vendor-default coverage in `tests/eval-cases.md` — Python SDK-backed first-success provider combo and default-parameter checks

### Changed

- `SKILL.md`, `references/conversational-ai/README.md`: changed documentation lookup to a strict local-reference-first policy so ConvoAI requests consult bundled module references before any Level 2 live-doc fetch
- `SKILL.md`: added stronger direct-routing cues for clearly ConvoAI-specific requests such as agent demos, provider questions, and MLLM requests instead of sending them to intake first
- `references/conversational-ai/README.md`: added working-baseline routing so new-project and unproven integration requests enter a constrained quickstart path before code generation
- `references/conversational-ai/quickstarts.md`: rewritten as a locked quickstart state machine with baseline-path, readiness, and backend-path gates; preserves the existing repo/setup references after the gates resolve
- `references/conversational-ai/quickstarts.md`, `references/conversational-ai/python-sdk.md`, `references/conversational-ai/README.md`: now use the official current provider docs as the source of truth for provider matrices and vendor-specific configs, while keeping the local quickstart focused on the first-success default combo and sample-aligned env names
- `references/conversational-ai/quickstarts.md`, `references/conversational-ai/README.md`: aligned the sequence with the state machine, made the MLLM vs cascading split explicit in the vendor gate, documented baseline-path rollback behavior, and clarified that Path B may require a private repo fallback
- `references/conversational-ai/quickstarts.md`: softened the opening quickstart wording for user-facing conversations and added an explicit unsupported-provider prompt instead of implicit discouragement
- `references/conversational-ai/quickstarts.md`, `references/conversational-ai/README.md`: added a Studio Agent ID branch so Agora ConvoAI can reuse agents configured in `https://console.agora.io/studio/agents` instead of rebuilding the provider stack during quickstart
- `references/conversational-ai/conversational-ai-studio.md`: added a dedicated reference for the Agora Studio Agent ID path and clarified that it is different from the runtime `agent_id` returned by `/join`
- `references/conversational-ai/conversational-ai-studio.md`, `references/conversational-ai/quickstarts.md`, `references/conversational-ai/README.md`: documented the confirmed mapping that the Agora Studio Agent ID is passed via the request field `pipeline_id`
- `references/conversational-ai/conversational-ai-studio.md`: expanded the Studio path into a fixed request contract mirroring the preconfigured-agent flow, including field mapping, token separation, and response expectations

## [1.2.0]

### Added
Expand Down
21 changes: 19 additions & 2 deletions skills/agora/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,18 +81,35 @@ Examples of clear requests:

- "RTC Web video call" → `references/rtc/web.md`
- "ConvoAI Python" → `references/conversational-ai/README.md`
- "I want to build a demo that talks to an agent" → `references/conversational-ai/README.md`
- "What providers does ConvoAI support?" → `references/conversational-ai/README.md`
- "I want MLLM with Gemini" → `references/conversational-ai/README.md`
- "I already have an Agent ID from Agora Studio" → `references/conversational-ai/README.md`
- "Generate RTC token in Go" → `references/server/tokens.md`

**Vague or multi-product request:** Route through `intake/SKILL.md`.
Only do this when the product is still genuinely unclear after checking for obvious
ConvoAI / RTC / RTM / Cloud Recording / Server Gateway / token-server cues.
Intake handles product identification, combination recommendations, and routing.

## Documentation Lookup

Check bundled references first (Level 1). If they don't cover the detail needed,
Check bundled references first (Level 1). Start with the most relevant local module file
for the user's product and question. If the local reference does not cover the needed detail,
fetch `https://docs.agora.io/en/llms.txt`, find the relevant URL, and fetch it (Level 2).
See [references/doc-fetching.md](references/doc-fetching.md) for the full procedure, fallback URLs, and freeze-forever decision table.

**Always fetch Level 2 before answering questions about**: TTS/ASR/LLM vendor configs, model names, full request/response schemas, error code listings, or release notes. These change frequently — do not answer from training data or memory.
**Local-first rule:** never skip the relevant local module reference just because live docs exist.
Read the local module first, then fetch Level 2 only if:

- the local file does not cover the needed detail
- the user asks for the complete latest matrix
- the question is about exact current request/response schemas
- the question is about error code listings or release notes

For ConvoAI vendor/provider questions, route to `references/conversational-ai/README.md` first.
That module decides whether the bundled ConvoAI references are enough or whether the official
current provider docs must be fetched.

**If MCP is unavailable or Level 2 fetch fails**: use the fallback URLs in `doc-fetching.md` to reach the official markdown docs directly. Never fabricate API parameters — always tell the user to verify against official docs if live fetch is unavailable.

Expand Down
46 changes: 35 additions & 11 deletions skills/agora/references/conversational-ai/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,30 @@

REST API-driven voice AI agents. Create agents that join RTC channels and converse with users via speech. Front-end clients connect via RTC+RTM.

## Start Here: New Projects
## Routing: Classify the Request

**Building a new Conversational AI agent? Clone a quickstart repo — do not build from scratch.**
The key question: does the user already have a **working ConvoAI baseline**?

| Path | Repo | Use when |
|---|---|---|
| **Full-stack Next.js** (default) | [agent-quickstart-nextjs](https://github.com/AgoraIO-Conversational-AI/agent-quickstart-nextjs) | Single repo: Next.js API routes + React UI |
| **Python backend + React frontend** | [conversational-ai-quickstart](https://github.com/AgoraIO-Community/conversational-ai-quickstart) *(private)* | Separate Python server + standalone React client |
- **Working baseline** = an Agora ConvoAI agent has already been started successfully end to end, and the client can join the same RTC channel and interact with it.
- **Not a working baseline** = only RTC code exists, a sample repo is cloned but not proven, env vars are present, or the user only knows the target backend language.

See **[quickstarts.md](quickstarts.md)** for clone steps, env vars, and setup instructions.
| Mode | When | Route to |
|---|---|---|
| `quickstart` | Starting from scratch, first demo, wants the official baseline | [quickstarts.md](quickstarts.md) |
| `integration` | Has an app or repo, but the ConvoAI path is not proven end to end yet | [quickstarts.md](quickstarts.md) |
| `backend-implementation` | Working baseline confirmed, now needs server code or lifecycle/auth changes | [server-sdks.md](server-sdks.md), [python-sdk.md](python-sdk.md), [go-sdk.md](go-sdk.md), or [auth-flow.md](auth-flow.md) |
| `client-customization` | Working baseline confirmed, now needs transcripts, hooks, UI, or mobile client work | [agent-toolkit.md](agent-toolkit.md), [agent-client-toolkit-react.md](agent-client-toolkit-react.md), [agent-ui-kit.md](agent-ui-kit.md), [agent-toolkit-ios.md](agent-toolkit-ios.md), [agent-toolkit-android.md](agent-toolkit-android.md) |
| `studio-agent` | The user already has an Agora Studio Agent ID and wants to reuse that Studio-managed agent config | [quickstarts.md](quickstarts.md), then [conversational-ai-studio.md](conversational-ai-studio.md) |
| `advanced-feature` / `debugging` / `ops-hardening` | Working baseline confirmed, wants custom LLM, memory, webhooks, production hardening, or error diagnosis | Start in this file, then route to the relevant reference below |

### Routing Rules

- If the user does **not** have a working baseline yet, read only this file and [quickstarts.md](quickstarts.md).
- While quickstart is unresolved, do **not** generate `/join` payloads, propose a custom project structure, or jump straight into SDK code.
- Existing RTC code or a checked-out repo is not enough to skip quickstart; the ConvoAI path must already work once.
- If the user explicitly says the baseline already works, skip quickstart and route directly to the relevant implementation file.
- If the user explicitly says they already have an **Agora Studio Agent ID** from `https://console.agora.io/studio/agents`, treat that as a dedicated ConvoAI path rather than re-running the provider-choice flow.
- If the user needs Java, Ruby, PHP, C#, or another non-SDK backend language, use [auth-flow.md](auth-flow.md) after the quickstart path is chosen.

## SDK vs. Direct REST API

Expand Down Expand Up @@ -51,14 +65,21 @@ ASR → LLM → TTS Receives audio + transcripts
## Documentation Lookup

The bundled references in this file cover gotchas, generation rules, and the stable
behavioral contracts. For content that changes with doc updates, use Level 2:
behavioral contracts. Read the relevant local ConvoAI reference first, then use Level 2 only
if the local file does not cover the detail needed.

For vendor/provider questions, use the official current provider docs as the source of truth
once the question moves beyond the default quickstart combo. The bundled quickstart references
are still the right source for the first-success default path, but the current provider matrix,
vendor availability, beta status, and vendor-specific configs should come from live docs.

For content that still needs live docs after the local check, use Level 2:

1. Fetch `https://docs.agora.io/en/llms.txt`
2. Scan for a URL matching your topic (e.g., `conversational-ai`, `quick-start`, `rest-api`)
3. Fetch that URL

Common topics to fetch via Level 2: quick-start code (Python, Go, Java), TTS/ASR/LLM
vendor configs, error code listings.
Common topics to fetch via Level 2 after the local reference check: quick-start code (Python, Go, Java), provider matrices, vendor-specific configs, full request/response schemas, newly changed vendor configs, error code listings.

For full request/response schemas, fetch the OpenAPI spec directly — it is always
current and covers every endpoint and field:
Expand Down Expand Up @@ -135,6 +156,7 @@ Things the official docs don't emphasize that cause frequent mistakes:
- **Use token auth as the default for new direct REST integrations.** The ConvoAI REST API accepts `Authorization: agora token=<token>` using a combined RTC + RTM token from `RtcTokenBuilder.buildTokenWithRtm`. This is **safer than Basic Auth**: tokens are scoped to a single App ID + channel, while Customer ID/Secret grants access to every project on the account. Use Basic Auth only when a user explicitly needs that mode.
- **POST `/join` success does not mean the agent is already in the RTC channel** — the request was accepted and the agent is starting. The client should wait for the RTC `user-joined` event before expecting agent audio or querying media state.
- **`/update` overwrites `params` entirely** — sending `{ "llm": { "params": { "max_tokens": 2048 } } }` erases `model` and everything else in `params`. Always send the full object.
- **Agora Studio Agent ID is not the same thing as the runtime `agent_id` returned by `/join`** — the Studio Agent ID comes from the Studio Agents page and identifies a Studio-managed agent configuration. In the Studio-managed start path, that value maps to the request field `pipeline_id`. The runtime `agent_id` identifies a started live session returned by the REST API. Do not use one in place of the other.
- **`/speak` priority enum** — `"INTERRUPT"` (immediate, default), `"APPEND"` (queued after current speech), `"IGNORE"` (skip if agent is busy). `interruptable: false` prevents users from cutting in.
- **20 PCU default limit** — max 20 concurrent agents per App ID. Exceeding returns error on `/join`. Contact Agora support to increase.
- **Event notifications require two flags** — `advanced_features.enable_rtm: true` AND `parameters.data_channel: "rtm"` in the join config. Without both, `onAgentStateChanged`/`onAgentMetrics`/`onAgentError` won't fire. Additionally: `parameters.enable_metrics: true` for metrics, `parameters.enable_error_message: true` for errors.
Expand Down Expand Up @@ -164,10 +186,12 @@ Use the file that matches what the user is building:

| User's question / task | Read this file |
|---|---|
| Starting a new project — which repo to clone, setup, env vars | [quickstarts.md](quickstarts.md) |
| No working ConvoAI baseline yet — choose the baseline path, setup order, and readiness gates | [quickstarts.md](quickstarts.md) |
| Node.js/Python/Go backend — starting agent, auth, session lifecycle | [server-sdks.md](server-sdks.md) |
| Python SDK specifics (async, deprecations, debug) | [python-sdk.md](python-sdk.md) |
| Go SDK specifics (context, builder, status constants) | [go-sdk.md](go-sdk.md) |
| Supported vendors and current vendor-specific configs | Fetch the official ConvoAI provider docs after reading this file |
| Existing Agora Studio Agent ID from `console.agora.io/studio/agents` | [conversational-ai-studio.md](conversational-ai-studio.md) |
| Auth flow, token types, direct REST API (non-SDK languages) | [auth-flow.md](auth-flow.md) |
| Full working demo app architecture, profiles, MLLM/Gemini | [agent-samples.md](agent-samples.md) |
| Web/React client: transcripts, agent state, sendText, interrupt | [agent-toolkit.md](agent-toolkit.md) |
Expand Down
Loading
Loading