feat: add ChatCompletionsTransport + wire all default paths#13447
Closed
kshitijk4poor wants to merge 1 commit intomainfrom
Closed
feat: add ChatCompletionsTransport + wire all default paths#13447kshitijk4poor wants to merge 1 commit intomainfrom
kshitijk4poor wants to merge 1 commit intomainfrom
Conversation
ed5b073 to
d4d234c
Compare
d4d234c to
d5fa553
Compare
Add ChatCompletionsTransport for the default api_mode used by ~16 OpenAI-compatible providers (OpenRouter, Nous, NVIDIA, Qwen, Ollama, DeepSeek, xAI, custom, etc.). Wire ALL transport methods to production paths in run_agent.py: - build_kwargs: extract 210-line else branch with 13 provider-specific conditionals (Qwen portal, NVIDIA NIM, Ollama, reasoning, developer role swap, provider preferences, max_tokens defaults) - validate_response: response.choices validation gate - extract_cache_stats: OpenRouter prompt_tokens_details extraction - convert_messages: codex field sanitization (identity otherwise) - convert_tools: identity (already in OpenAI format) - normalize_response: near-identity wrapper returning NormalizedResponse Agent gathers state (provider detection flags, temperature, preferences) and passes as explicit params to transport.build_kwargs(). run_agent.py: -197 lines in _build_api_kwargs (12,054 -> 11,948). 26 new tests (build_kwargs: 12, validate: 4, normalize: 2, cache: 3, convert: 3, registration: 2). All transport tests pass. PR 5 of the provider transport refactor.
d5fa553 to
7570414
Compare
| is_nous=_is_nous, | ||
| is_qwen_portal=_is_qwen, | ||
| is_github_models=_is_gh, | ||
| is_nvidia_nim="integrate.api.nvidia.com" in self._base_url_lower, |
teknium1
pushed a commit
that referenced
this pull request
Apr 22, 2026
Third concrete transport — handles the default 'chat_completions' api_mode used by ~16 OpenAI-compatible providers (OpenRouter, Nous, NVIDIA, Qwen, Ollama, DeepSeek, xAI, Kimi, custom, etc.). Wires build_kwargs + validate_response to production paths. Based on PR #13447 by @kshitijk4poor, with fixes: - Preserve tool_call.extra_content (Gemini thought_signature) via ToolCall.provider_data — the original shim stripped it, causing 400 errors on multi-turn Gemini 3 thinking requests. - Preserve reasoning_content distinctly from reasoning (DeepSeek/Moonshot) so the thinking-prefill retry check (_has_structured) still triggers. - Port Kimi/Moonshot quirks (32000 max_tokens, top-level reasoning_effort, extra_body.thinking) that landed on main after the original PR was opened. - Keep _qwen_prepare_chat_messages_inplace alive and call it through the transport when sanitization already deepcopied (avoids a second deepcopy). - Skip the back-compat SimpleNamespace shim in the main normalize loop — for chat_completions, response.choices[0].message is already the right shape with .content/.tool_calls/.reasoning/.reasoning_content/.reasoning_details and per-tool-call .extra_content from the OpenAI SDK. run_agent.py: -239 lines in _build_api_kwargs default branch extracted to the transport. build_kwargs now owns: codex-field sanitization, Qwen portal prep, developer role swap, provider preferences, max_tokens resolution (ephemeral > user > NVIDIA 16384 > Qwen 65536 > Kimi 32000 > anthropic_max_output), Kimi reasoning_effort + extra_body.thinking, OpenRouter/Nous/GitHub reasoning, Nous product attribution tags, Ollama num_ctx, custom-provider think=false, Qwen vl_high_resolution_images, request_overrides. 39 new transport tests (8 build_kwargs, 5 Kimi, 4 validate, 4 normalize including extra_content regression, 3 cache stats, 3 basic). Tests/run_agent/ targeted suite passes (885/885 + 15 skipped; the 1 remaining failure is the test_concurrent_interrupt flake present on origin/main).
Contributor
|
Salvaged and merged via #13805 onto current main. Your authorship is preserved on the merge commit (83d86ce). A few adjustments were layered on top of your original:
Thanks for the substantive work — transport layer is now 3/4 complete (Anthropic, Responses, ChatCompletions). BedrockTransport (#13467) is next. |
TFITZ57
added a commit
to TFITZ57/hermes-agent
that referenced
this pull request
Apr 23, 2026
* fix(tui): raise picker selection contrast with inverse + bold
Selected rows in the model/session/skills pickers and approval/clarify
prompts only changed from dim gray to cornsilk, which reads as low
contrast on lighter themes and LCDs (reported during TUI v2 blitz).
Switch the selected row to `inverse bold` with the brand accent color
across modelPicker, sessionPicker, skillsHub, and prompts so the
highlight is terminal-portable and unambiguous. Unselected rows stay
dim. Also extends the sessionPicker middle meta column (which was
always dim) to inherit the row's selection state.
* fix(model-switch): drop stale provider from fallback chain and env after /model
Reported during the TUI v2 blitz test: switching from openrouter to
anthropic via `/model <name> --provider anthropic` appeared to succeed,
but the next turn kept hitting openrouter — the provider the user was
deliberately moving away from.
Two gaps caused this:
1. `Agent.switch_model` reset `_fallback_activated` / `_fallback_index`
but left `_fallback_chain` intact. The chain was seeded from
`fallback_providers:` at agent init for the *original* primary, so
when the new primary returned 401 (invalid/expired Anthropic key),
`_try_activate_fallback()` picked the old provider back up without
informing the user. Prune entries matching either the old primary
(user is moving away) or the new primary (redundant) whenever the
primary provider actually changes.
2. `_apply_model_switch` persisted `HERMES_MODEL` but never updated
`HERMES_INFERENCE_PROVIDER`. Any ambient re-resolution of the runtime
(credential pool refresh, compressor rebuild, aux clients) falls
through to that env var in `resolve_requested_provider`, so it kept
reporting the original provider even after an in-memory switch.
Adds three regression tests: fallback-chain prune on primary change,
no-op on same-provider model swap, and env-var sync on explicit switch.
* fix(tui): @folder: only yields directories, @file: only yields files
Reported during TUI v2 blitz testing: typing `@folder:` in the composer
pulled up .dockerignore, .env, .gitignore, and every other file in the
cwd alongside the actual directories. The completion loop yielded every
entry regardless of the explicit prefix and auto-rewrote each completion
to @file: vs @folder: based on is_dir — defeating the user's choice.
Also fixed a pre-existing adjacent bug: a bare `@file:` or `@folder:`
(no path) used expanded=="." as both search_dir AND match_prefix,
filtering the list to dotfiles only. When expanded is empty or ".",
search in cwd with no prefix filter.
- want_dir = prefix == "@folder:" drives an explicit is_dir filter
- preserve the typed prefix in completion text instead of rewriting
- three regression tests cover: folder-only, file-only, and the bare-
prefix case where completions keep the `@folder:` prefix
* fix(tui): truncate long picker rows so the height stays stable
A6 added a fixed-height grid (Array.from({length: VISIBLE})), but the
row <Text> itself had no wrap prop so Ink defaulted to wrap="wrap".
A sufficiently long model or provider name would wrap to a second
visual line and bounce the overall picker height right back — which
is exactly what reappeared during the TUI v2 blitz retest on /model.
Pin every picker row (and the empty-state / padding rows) to
wrap="truncate-end" so each slot is guaranteed one line. Applies
across modelPicker, sessionPicker, and skillsHub.
* fix(tui): stabilize slash-completion dropdown height
The completion popup (e.g. typing `/model`) grew from 8 rows at
compIdx=0 up to 16 rows at compIdx≥8 — the slice end was `compIdx + 8`
so every arrow-down added another rendered row until the window filled.
Reported during TUI v2 retest: "as i scroll and more options appear,
for some reason more options appear and it expands the height".
Fixed viewport (`COMPLETION_WINDOW = 16`) centered on compIdx, clamped
so it never slides past the array bounds. Renders exactly
`min(WINDOW, completions.length)` rows every frame.
* fix(tui): pager supports scrolling (up/down/page/top/bottom)
The pager overlay backing /history, /toolsets, /help and any paged slash
output only advanced with Enter/Space and closed at the end. Could not
scroll back, scroll line-by-line, or jump to endpoints.
Adds Up/Down (↑↓, j/k), PgUp (b), g/G for top/bottom, keeps existing
Enter/Space/PgDn forward-and-auto-close, and clamps offset so
over-scrolling past the last page is a no-op.
* fix(tui): preserve prior segment output on Ctrl+C interrupt
interruptTurn only flushed the in-flight streaming chunk (bufRef) to
the transcript before calling idle(), which wiped segmentMessages and
pendingSegmentTools. Every tool call and commentary line the agent had
already emitted in the current turn disappeared the moment the user
cancelled, even though that output is exactly what they want to keep
when they hit Ctrl+C (quote from the blitz feedback: "everything was
fine up until the point where you wanted to push to main").
Append each flushed segment message to the transcript first, then
render the in-flight partial with the `*[interrupted]*` marker and its
pendingSegmentTools. Sys-level "interrupted" note still fires when
there is nothing to preserve.
* fix(tui): route skills.manage through the long-handler thread pool
`/skills browse` is documented to scan 6 sources and take ~15s, but the
gateway dispatched `skills.manage` on the main RPC thread. While it
ran, every other inbound RPC — completions, new slash commands, even
`approval.respond` — blocked until the HTTP fetches finished, making
the whole TUI feel frozen. Reported during TUI v2 retest:
"/skills browse blocks everything else".
`_LONG_HANDLERS` already exists precisely for this pattern (slash.exec,
shell.exec, session.resume, etc. run on `_pool`). Add `skills.manage`
to that set so browse/search/install run off the dispatcher; the fast
`list` / `inspect` actions pay a negligible thread-pool hop.
* improve llama.cpp skill
* fix(skills/llama-cpp): concise description, restore python bindings, fix curl
- Description truncated to 60 chars in system prompt (extract_skill_description),
so the 500-char HF workflow description never reached the agent; shortened to
'llama.cpp local GGUF inference + HF Hub model discovery.' (56 chars).
- Restore llama-cpp-python section (basic, chat+stream, embeddings,
Llama.from_pretrained) and frontmatter dependencies entry.
- Fix broken 'Authorization: Bearer ***' curl line (missing closing quote;
llama-server doesn't require auth by default).
* fix(gateway): always inject reply-to pointer, not just when quoted text is absent (#13676)
The [Replying to: "..."] prefix is disambiguation, not deduplication. When
a user explicitly replies to a prior message, the agent needs a pointer to
which specific message they're referencing — even when the quoted text
already exists somewhere in history. History can contain the same or
similar text multiple times; without an explicit pointer the agent has to
guess (or answer for both subjects), and the reply signal is silently
dropped.
Example: in a conversation comparing Japan and Italy, replying to the
"Japan is great for culture..." message and asking "What's the best time
to go?" — previously the found_in_history check suppressed the prefix
because the quoted text was already in history, leaving the agent to
guess which destination the user meant. Now the pointer is always present.
Drops the found_in_history guard added in #1594. Token overhead is
minimal (snippet capped at 500 chars on the new user turn; cached prefix
unaffected). Behavior becomes deterministic: reply sent ⇒ pointer present.
Thanks to smartyi for flagging this.
* feat(image-gen): add GPT Image 2 to FAL catalog (#13677)
Adds OpenAI's new GPT Image 2 model via FAL.ai, selectable through
`hermes tools` → Image Generation. SOTA text rendering (including CJK)
and world-aware photorealism.
- FAL_MODELS entry with image_size_preset style
- 4:3 presets on all aspect ratios — 16:9 (1024x576) falls below
GPT-Image-2's 655,360 min-pixel floor and would be rejected
- quality pinned to medium (same rule as gpt-image-1.5) for
predictable Nous Portal billing
- BYOK (openai_api_key) deliberately omitted from supports so all
users stay on shared FAL billing
- 6 new tests covering preset mapping, quality pinning, and
supports-whitelist integrity
- Docs table + aspect-ratio map updated
Live-tested end-to-end: 39.9s cold request, clean 1024x768 PNG
* refactor(delegate): drop dead default_toolsets from CLI default config
delegation.default_toolsets was declared in cli.py's CLI_CONFIG default
dict and documented in cli-config.yaml.example, but never read: none of
tools/delegate_tool.py, _load_config(), or any call site ever looked it
up. The live fallback is the DEFAULT_TOOLSETS module constant at
tools/delegate_tool.py:101, which stays as-is.
hermes_cli/config.py's DEFAULT_CONFIG["delegation"] already omits the
key — this commit aligns cli.py with that.
Adds a regression test in tests/hermes_cli/test_config_drift.py so a
future refactor that re-adds the key without wiring it up to
_load_config() fails loudly.
Part of Initiative 2 / M0.5.
* docs(delegate): remove default_toolsets from example config and docs
Matches the default-config removal in the preceding commit.
default_toolsets was documented for users to set but was never actually
read at runtime, so showing it in the example config and the delegation
user guide was misleading.
No deprecation note is added: the key was always a no-op, so users who
copied it from the example continue to see no behavior change. Their
config.yaml still parses; the key is just silently unused, same as
before.
Part of Initiative 2 / M0.5.
* test(delegate): make default_toolsets regression test robust to user config
The prior form of this test asserted on CLI_CONFIG["delegation"] after
importing cli, which only passed by accident of pytest-xdist worker
scheduling. cli._hermes_home is frozen at module import time (cli.py:76),
before the tests/conftest.py autouse HERMES_HOME-isolation fixture can
fire, so CLI_CONFIG ends up populated by deep-merging the contributor's
actual ~/.hermes/config.yaml over the defaults (cli.py:359-366). Any
contributor (like me) who still has the legacy key set in their own
config causes a false failure the moment another test file in the same
xdist worker imports cli at module level.
Asserting on the source of load_cli_config() instead sidesteps all of
that: the test now checks the defaults literal directly and is
independent of user config, HERMES_HOME, import order, and worker
scheduling.
Demonstrated failure mode before this fix:
pytest tests/hermes_cli/test_config_drift.py \
tests/hermes_cli/test_skills_hub.py -o addopts=""
-> FAILED (CLI_CONFIG["delegation"] contained "default_toolsets"
from the user's ~/.hermes/config.yaml)
Part of Initiative 2 / M0.5.
* feat(gateway): recognize .pdf in MEDIA: tag extraction (#13683)
PDFs emitted by tools (report generators, document exporters, etc.) now
deliver as native attachments when wrapped in MEDIA: — same as images,
audio, and video.
Bare .pdf paths are intentionally NOT added to extract_local_files(), so
the agent can still reference PDFs in text without auto-sending them.
* fix(tui): inject VS16 so text-default emoji render as color glyphs
Models frequently emit bare codepoints like U+26A0 (⚠), U+2139 (ℹ),
U+2764 (❤), U+2714 (✔), U+2600 (☀), U+263A (☺) which, per Unicode, have
Emoji_Presentation=No and render as monochrome text-style glyphs in
terminals unless followed by VS16 (U+FE0F). Agent output leaked through
the TUI like `⚠ careful` instead of `⚠️ careful`.
Added `ensureEmojiPresentation` (lib/emoji.ts): scans for the curated
set of text-default codepoints and appends VS16 when the next char is
not already VS16, ZWJ, or a keycap-enclosing mark. Idempotent and
fast-pathed by a Unicode-range regex so ASCII-heavy text is untouched.
Applied once at the top of `Md`'s line parse. Hermes-ink's stringWidth
already accounts for VS16, so cursor/layout stays correct.
* feat(delegate): orchestrator role and configurable spawn depth (default flat)
Adds role='leaf'|'orchestrator' to delegate_task. With max_spawn_depth>=2,
an orchestrator child retains the 'delegation' toolset and can spawn its
own workers; leaf children cannot delegate further (identical to today).
Default posture is flat — max_spawn_depth=1 means a depth-0 parent's
children land at the depth-1 floor and orchestrator role silently
degrades to leaf. Users opt into nested delegation by raising
max_spawn_depth to 2 or 3 in config.yaml.
Also threads acp_command/acp_args through the main agent loop's delegate
dispatch (previously silently dropped in the schema) via a new
_dispatch_delegate_task helper, and adds a DelegateEvent enum with
legacy-string back-compat for gateway/ACP/CLI progress consumers.
Config (hermes_cli/config.py defaults):
delegation.max_concurrent_children: 3 # floor-only, no upper cap
delegation.max_spawn_depth: 1 # 1=flat (default), 2-3 unlock nested
delegation.orchestrator_enabled: true # global kill switch
Salvaged from @pefontana's PR #11215. Overrides vs. the original PR:
concurrency stays at 3 (PR bumped to 5 + cap 8 — we keep the floor only,
no hard ceiling); max_spawn_depth defaults to 1 (PR defaulted to 2 which
silently enabled one level of orchestration for every user).
Co-authored-by: pefontana <fontana.pedro93@gmail.com>
* fix(auxiliary): refresh Nous runtime credentials after aux 401s
* docs(delegate): clarify that the parent agent, not the user, populates goal/context (#13698)
The 'subagents know nothing' warning and the 'no conversation history'
constraint both said the user provides the goal/context fields. In
practice the LLM parent agent calls delegate_task; the user configures
the feature but doesn't write delegation calls. Rewording to point at
the parent agent matches how the tool actually works.
* fix(vision): resolve Nous vision model correctly in auto-detect path
Two changes:
1. _PROVIDER_VISION_MODELS: add 'nous' -> 'xiaomi/mimo-v2-omni' entry
so the vision auto-detect chain picks the correct multimodal model.
2. resolve_provider_client: detect when the requested model is a vision
model (from _PROVIDER_VISION_MODELS or known vision model names) and
pass vision=True to _try_nous(). Previously, _try_nous() was always
called without vision=True in resolve_provider_client(), causing it to
return the default text model (gemini-3-flash-preview or mimo-v2-pro)
instead of the vision-capable mimo-v2-omni.
The _try_nous() function already handled free-tier vision correctly, but
the resolve_provider_client() path (used by the auto-detect vision chain)
never signaled that a vision task was in progress.
Verified: xiaomi/mimo-v2-omni returns HTTP 200 with image inputs on Nous
inference API. google/gemini-3-flash-preview returns 404 with images.
* chore(release): add Ifkellx to AUTHOR_MAP for PR #12687
* fix(security): TUI approval overlay accepts blind keystrokes, CLI thread-local callback invisible to agent
Two bugs that allow dangerous commands to execute without informed user consent.
TUI (Ink): useInputHandlers consumes the isBlocked return path, but Ink's
EventEmitter delivers keystrokes to ALL registered useInput listeners. The
ApprovalPrompt component receives arrow keys, number keys, and Enter even
though the overlay appears frozen. The user sees no visual feedback, but
keystrokes are processed — allowing blind approval, session-wide auto-approve
(choice "session"), or permanent allowlist writes (choice "always") without
the user knowing.
Discovered while replicating #13618 (TUI approval overlay freezes terminal).
Fix: in useInputHandlers, when overlay.approval/clarify/confirm is active,
only intercept Ctrl+C. All other keys pass through. This makes the overlay
visually responsive so the user can see what they are selecting.
CLI (prompt_toolkit): _callback_tls in terminal_tool.py is threading.local().
set_approval_callback() is called in the main thread during run(), but the
agent executes in a background thread. _get_approval_callback() returns None
in the agent thread, falling back to stdin input() which prompt_toolkit
blocks. The user sees the approval text but cannot respond — the terminal is
unusable until the 60s timeout expires with a default "deny".
Fix: set callbacks inside run_agent() (the thread target), matching the
pattern already used by acp_adapter/server.py. Clear on thread exit to avoid
stale references.
Closes #13618
* test(approval): regression guards for thread-local callback contract
Two unit tests that pin down the threading.local semantics the CLI freeze
fix (#13617 / #13618) relies on:
- main-thread registration must be invisible to child threads (documents
the underlying bug — if this ever starts passing visible, ACP's
GHSA-qg5c-hvr5-hjgr race has returned)
- child-thread registration must be visible from that same thread AND
cleared by the finally block (documents the fix pattern used by
cli.py's run_agent closure and acp_adapter/server.py)
Pairs with the fix in the preceding commit by @Societus.
* fix(vision): route Nous main-provider vision through tier-aware backend
* fix(vision): restore tier-aware Nous vision model selection (#13703)
Revert two overreaches from #13699 that forced paid Nous vision to
xiaomi/mimo-v2-omni instead of the tier-appropriate gemini-3-flash-preview:
1. Remove "nous": "xiaomi/mimo-v2-omni" from _PROVIDER_VISION_MODELS —
#13696 already routes nous main-provider vision through the strict
backend, and this entry caused any direct resolve_provider_client(
"nous", ...) aggregator-lookup path to pick the wrong model for paid.
2. Drop the 'elif vision' paid override in _try_nous() that forced
mimo-v2-omni on every Nous vision call regardless of tier. Paid
accounts now keep gemini-3-flash-preview for vision as well as text.
Free-tier behavior unchanged: still uses mimo-v2-omni for vision,
mimo-v2-pro for text (check_nous_free_tier() branch).
E2E verified:
paid vision → google/gemini-3-flash-preview
free vision → xiaomi/mimo-v2-omni
paid text → google/gemini-3-flash-preview
free text → xiaomi/mimo-v2-pro
* feat(llm-wiki): port provenance markers, source hashing, and quality signals from llm-wiki-compiler (#13700)
Three additive conventions inspired by github.com/atomicmemory/llm-wiki-compiler:
- Paragraph-level provenance: `^[raw/articles/source.md]` markers on pages synthesizing 3+ sources, so readers can trace individual claims without re-reading full source files.
- Raw source content hashing: `sha256:` in raw/ frontmatter enables re-ingest drift detection — skip unchanged sources, flag changed ones.
- Optional `confidence` and `contested` frontmatter fields let lint surface weak or disputed claims without re-reading every page's prose.
Lint gains two new checks (quality signals, source drift) and one expanded check (contradictions now surfaces frontmatter-flagged pages).
Also adds a Related Tools section pointing users who want batch/scheduled compilation at llm-wiki-compiler (Obsidian-compatible, works on the same vault).
All additions are opt-in — existing wikis need no migration. Skill version 2.0.0 -> 2.1.0.
* fix(tui): don't swallow Kimi/Qwen ~! ~? kaomoji as subscript spans
The inline markdown regex had `~([^~\s][^~]*?)~` for Pandoc-style subscript
(H~2~O, CO~2~). On models that decorate prose with kaomoji like `thing ~!`
and `cool ~?` — Kimi especially — the opener `~!` paired with the next
stray `~` on the line and dim-formatted everything between them with a
leading `_` character, mangling markdown output.
Tighten the pattern to short alphanumeric-only content (`~[A-Za-z0-9]{1,8}~`)
since real subscript never contains punctuation, spaces, or long runs.
Same tightening applied to stripInlineMarkup so width measurement stays
consistent. Classic CLI was unaffected because it renders these literally.
* refactor(tui): clean markdown.tsx per KISS/DRY
- Drop the outer no-op capture group from INLINE_RE and restructure the
source as an ordered list of patterns-with-index-comments so each
alternative is individually greppable. Shift group indices in MdInline
down by one accordingly.
- Inline single-use helpers (parseFence, isFenceClose, isMarkdownFence,
trimBareUrl) and intermediate variables (path, lang, raw, prefix, body,
depth, task body, setext match, etc.).
- Hoist block-level regexes used inside MdImpl (FENCE_CLOSE_RE, SETEXT_RE,
BULLET_RE, TASK_RE, NUMBERED_RE, QUOTE_RE) to top-level consts so
they're compiled once instead of per-line.
- Collapse the duplicate compact-vs-normal blank-line branches into one
if/!compact gap call.
- Move Fence and MdProps types to the bottom per house style.
- Shorten splitTableRow → splitRow and use optional chaining in a few
match sites.
No behavior change; 162/162 tests pass. Net -22 LoC.
* fix(tui): /resume picker shows telegram/discord/etc sessions
Reported during TUI v2 blitz retest: /resume modal only surfaced tui/cli
rows, even though `hermes --tui --resume <id>` with a pasted telegram
session id works fine. The handler double-fetched with explicit
`source="tui"` and `source="cli"` filters and dropped everything else on
the floor.
Drop the filter — list_sessions_rich(source=None) already excludes
child sessions (subagents, compression continuations) via its default,
and users want to resume messenger sessions from inside the TUI.
Adds gateway regression coverage.
* fix(tui): up-arrow inside a multi-line buffer moves cursor, not history
Reported during TUI v2 blitz retest: typing a multi-line message with
shift-Enter and then pressing Up to edit an earlier line swapped the
whole buffer for the previous history entry instead of moving the
cursor up a line. Down then restored the draft → the buffer appeared
to "flip" between the draft and a prior prompt.
`useInputHandlers` cycles history on Up/Down, but textInput only
checked `inputBuf.length` — that only counts lines committed with a
trailing backslash, not shift-Enter newlines inside `input` itself.
Fix: detect logical lines inside the input string and move the cursor
one line up/down preserving column offset (clamp to line end when the
destination is shorter, standard editor behavior). Only fall through
to history cycling when the cursor is already on the first line (Up)
or last line (Down).
Adds unit coverage for the new `lineNav` helper.
* fix(tui): /history shows the TUI's own transcript, scrollable
Reported during TUI v2 blitz retest: `/history` in the TUI only shows
prompts from non-TUI Hermes runs and can't scroll the window. Root
cause is the slash-worker subprocess: it's a detached HermesCLI that
never sees the TUI's turns, so its `conversation_history` starts empty
and `show_history` surfaces whatever was persisted from earlier CLI
sessions — not what the user just did inside the TUI.
Intercept `/history` as a local slash command so it dumps
`ctx.local.getHistoryItems()` — the TUI's own transcript — routed
through the pager (which scrolls after #13591). Accepts an optional
preview-length argument (default 400 chars per message).
Adds createSlashHandler coverage.
* fix(tui): tool inline_diff renders inline with the active turn
Reported during TUI v2 blitz retest: code-review diffs from tool.complete
appeared at the top of the current interaction thread, out of sequence
with the agent's messages and tool rows below them.
Root cause — `sys(inline_diff)` appends to `historyItems`, which sits
above the `StreamingAssistant` pane that renders the active turn.
Until the turn closed, the diff visually floated above everything
else happening in the same turn.
Route the diff through `turnController.appendSegmentMessage` instead
so it flushes any pending streaming text first, then lands in the
segment stream beside assistant output and tool calls. On
`message.complete` the segment list is committed to history in emit
order (diff → final text), matching what the gateway sent.
Adds a regression test that exercises tool.complete → message.complete
with an inline_diff payload and asserts both the streaming and final
placement.
* feat(delegate): cross-agent file state coordination for concurrent subagents (#13718)
* feat(models): hide OpenRouter models that don't advertise tool support
Port from Kilo-Org/kilocode#9068.
hermes-agent is tool-calling-first — every provider path assumes the
model can invoke tools. Models whose OpenRouter supported_parameters
doesn't include 'tools' (e.g. image-only or completion-only models)
cannot be driven by the agent loop and fail at the first tool call.
Filter them out of fetch_openrouter_models() so they never appear in
the model picker (`hermes model`, setup wizard, /model slash command).
Permissive when the field is missing — OpenRouter-compatible gateways
(Nous Portal, private mirrors, older snapshots) don't always populate
supported_parameters. Treat missing as 'unknown → allow' rather than
silently emptying the picker on those gateways. Only hide models
whose supported_parameters is an explicit list that omits tools.
Tests cover: tools present → kept, tools absent → dropped, field
missing → kept, malformed non-list → kept, non-dict item → kept,
empty list → dropped.
* feat(delegate): cross-agent file state coordination for concurrent subagents
Prevents mangled edits when concurrent subagents touch the same file
(same process, same filesystem — the mangle scenario from #11215).
Three layers, all opt-out via HERMES_DISABLE_FILE_STATE_GUARD=1:
1. FileStateRegistry (tools/file_state.py) — process-wide singleton
tracking per-agent read stamps and the last writer globally.
check_stale() names the sibling subagent in the warning when a
non-owning agent wrote after this agent's last read.
2. Per-path threading.Lock wrapped around the read-modify-write
region in write_file_tool and patch_tool. Concurrent siblings on
the same path serialize; different paths stay fully parallel.
V4A multi-file patches lock in sorted path order (deadlock-free).
3. Delegate-completion reminder in tools/delegate_tool.py: after a
subagent returns, writes_since(parent, child_start, parent_reads)
appends '[NOTE: subagent modified files the parent previously
read — re-read before editing: ...]' to entry.summary when the
child touched anything the parent had already seen.
Complements (does not replace) the existing path-overlap check in
run_agent._should_parallelize_tool_batch — batch check prevents
same-file parallel dispatch within one agent's turn (cheap prevention,
zero API cost), registry catches cross-subagent and cross-turn
staleness at write time (detection).
Behavior is warning-only, not hard-failing — matches existing project
style. Errors surface naturally: sibling writes often invalidate the
old_string in patch operations, which already errors cleanly.
Tests: tests/tools/test_file_state_registry.py — 16 tests covering
registry state transitions, per-path locking, per-path-not-global
locking, writes_since filtering, kill switch, and end-to-end
integration through the real read_file/write_file/patch handlers.
* fix(tui): only cycle history at input boundaries on arrows
Follow-up on #13726 from blitz feedback: Up/Down history cycling should only trigger when the caret is at the start/end boundary (or the input is empty).\n\nPreviously useInputHandlers intercepted arrows whenever inputBuf was empty, which still stole Up/Down from normal multiline editing. textInput now publishes caret position through inputSelectionStore even with no active selection, and useInputHandlers gates history/queue cycling on those boundaries.
* fix(tui): keep inline diffs below tool rows and strip ANSI
Follow-up on #13729 from blitz screenshot feedback.\n\n- When tool.complete carried inline_diff but no buffered assistant text existed, pending tool rows were still in streamPendingTools, so diff rendered above the tool row section. appendSegmentMessage now emits pending tool rows as a trail segment before appending the diff artifact.\n- Strip ANSI color escapes from inline_diff payloads so we don't render loud red/green terminal palettes in the transcript.
* fix(tui): narrow /resume sources to human adapters
Follow-up on #13724: showing literally every source was too noisy.\n\n now fetches a wider window (, larger limit) and then filters to a curated allowlist of human-facing sources (tui/cli plus chat adapters like telegram/discord/slack/whatsapp/etc). This keeps row #7 fixed (telegram sessions visible in /resume) without surfacing internal source kinds such as tool/acp.
* fix(tui): arrow history fallback when no line exists
Follow-up on multiline arrow behavior: Up/Down now fall back to queue/history whenever there is no logical line above/below the caret (not only at absolute start/end character positions). This makes Up from the end of the top line cycle history, matching expected readline-ish behavior.
* fix(tui): render inline diffs inside assistant completion
Follow-up for #13729: segment-level system artifacts still looked detached in real flow.\n\nInstead of appending inline_diff as a standalone segment/system row, queue sanitized diffs during tool.complete and append them as a fenced diff block to the assistant completion text on message.complete. This keeps the diff in the same message flow as the assistant response.
* fix(tui): dedupe inline_diff when assistant already echoes it
Avoid duplicate diff rendering in #13729 flow. We now skip queued inline diffs that are already present in final assistant text and dedupe repeated queued diffs by exact content.
* fix(tui): keep review-diff tool rows terse
When tool.complete already carries inline_diff, the assistant message owns the full diff block. Suppress the tool-row summary/detail in that case so the turn shows one detailed diff surface instead of a rich diff plus a duplicated tool-detail payload.
* fix(tui): dedupe inline diffs, strip CLI review-diff header
After the prior inline-diff fix, the gateway still prepends a literal
" ┊ review diff" line to inline_diff (it's terminal chrome written by
`_emit_inline_diff`). Wrapping that in a ```diff fence left that header
inside the code block. The agent also often narrates its own edit in a
second fenced diff, so the assistant message ended up stacking two
diff blocks for the same change.
- Strip the leading "┊ review diff" header from queued inline diffs
before fencing.
- Skip appending the fenced diff entirely when the assistant already
wrote its own ```diff (or ```patch) fence.
Keeps the single-surface diff UX even when the agent is chatty.
* fix(tts): use per-provider input-character caps instead of global 4000 (#13743)
A single global MAX_TEXT_LENGTH = 4000 truncated every TTS provider at
4000 chars, causing long inputs to be silently chopped even though the
underlying APIs allow much more:
- OpenAI: 4096
- xAI: 15000
- MiniMax: 10000
- ElevenLabs: 5000 / 10000 / 30000 / 40000 (model-aware)
- Gemini: ~5000
- Edge: ~5000
The schema description also told the model 'Keep under 4000 characters',
which encouraged the agent to self-chunk long briefs into multiple TTS
calls (producing 3 separate audio files instead of one).
New behavior:
- PROVIDER_MAX_TEXT_LENGTH table + ELEVENLABS_MODEL_MAX_TEXT_LENGTH
encode the documented per-provider limits.
- _resolve_max_text_length(provider, cfg) resolves:
1. tts.<provider>.max_text_length user override
2. ElevenLabs model_id lookup
3. provider default
4. 4000 fallback
- text_to_speech_tool() and stream_tts_to_speaker() both call the
resolver; old MAX_TEXT_LENGTH alias kept for back-compat.
- Schema description no longer hardcodes 4000.
Tests: 27 new unit + E2E tests; all 53 existing TTS tests and 253
voice-command/voice-cli tests still pass.
* feat(skills): add baoyu-comic skill
* refactor(skills): adapt baoyu-comic for Hermes
Port the upstream baoyu-comic skill to Hermes' tool ecosystem, matching
the earlier baoyu-infographic adaptation:
- metadata namespace openclaw -> hermes (+ tags, homepage)
- drop EXTEND.md preferences system (references/config/ removed,
workflow Step 1.1 removed)
- user prompts via clarify (one question at a time) instead of
AskUserQuestion batches
- image generation via image_generate instead of baoyu-imagine, with
aspect-ratio mapping to landscape/portrait/square
- Windows/PowerShell/WSL shell snippets dropped
- file I/O referenced via Hermes write_file/read_file tools
- CLI-style --flags converted to natural-language options and
user-intent cues (skill matching has no slash command trigger)
Add PORT_NOTES.md documenting the adaptations and a sync procedure.
Art-style/tone/layout reference files are preserved verbatim from
upstream v1.56.1.
* fix(skills): address baoyu-comic PR review
- Remove PDF merge feature and scripts/ directory (no pdf-lib dep)
- Correct image_generate docs: prompt-only, returns URL; add
curl download step after every call
- Downgrade reference images to text-based trait extraction
(style/palette/scene); character sheet is agent-facing reference
- Unify source file naming on source-{slug}.md across SKILL.md
and workflow.md
* fix(skills): clarify baoyu-comic character sheet role
Page prompts are written in Step 5 from the text descriptions in
characters/characters.md — the PNG sheet generated in Step 7.1
cannot be used to write them. Reposition the PNG as a human-facing
review artifact (and reference for later regenerations / manual
edits), and drop the confusing "Character sheet | Strategy" tables
since the embedding rule is uniform.
* docs: document delegation width + depth knobs (#13745)
Fills the three gaps left by the orchestrator/width-depth salvage:
- configuration.md §Delegation: max_concurrent_children, max_spawn_depth,
orchestrator_enabled are now in the canonical config.yaml reference
with a paragraph covering defaults, clamping, role-degradation, and
the 3x3x3=27-leaf cost scaling.
- environment-variables.md: adds DELEGATION_MAX_CONCURRENT_CHILDREN to
the Agent Behavior table.
- features/delegation.md: corrects stale 'default 5, cap 8' wording
(that was from the original PR; the salvage landed on default 3 with
no ceiling and a tool error on excess instead of truncation).
* fix(website): run skill extraction automatically on npm run build/start (#13747)
website/src/pages/skills/index.tsx imports ../../data/skills.json, but
that file is git-ignored and generated at build time by
website/scripts/extract-skills.py. CI workflows (deploy-site.yml,
docs-site-checks.yml) run the script explicitly before 'npm run build',
so production and PR checks always work — but 'npm run build' on a
contributor's machine fails with:
Module not found: Can't resolve '../../data/skills.json'
because the extraction step was never wired into the npm scripts.
Adds a prebuild/prestart hook that runs extract-skills.py automatically.
If python3 or pyyaml aren't installed locally, writes an empty
skills.json instead of hard-failing — the Skills Hub page renders with
an empty state, the rest of the site builds normally, and CI (which
always has the deps) still generates the full catalog for production.
* fix(skills/baoyu-comic): absolute curl paths + clarify-timeout handling (#13775)
* fix(skills/baoyu-comic): require absolute paths for curl -o downloads
When downloading generated images across several batches of image_generate
calls, relying on persistent-shell CWD is unsafe. The terminal tool's shell
can rotate (TERMINAL_LIFETIME_SECONDS expiry, a failed cd that leaves the
shell somewhere else), and 'curl -fsSL <url> -o relative.png' then silently
writes to the wrong directory with no error.
Update the skill's Step 7 Download step to require absolute -o paths (or
workdir= on the terminal tool) and add a matching pitfall entry referencing
the Apr 2026 incident where pages 06-09 of a 10-page comic landed at the
repo root instead of comic/<slug>/. The agent then spent several turns
claiming the files existed where they didn't.
* fix(skills/baoyu-comic): handle clarify timeouts correctly in Step 2
A clarify timeout returning 'Use your best judgement to make the choice
and proceed' is NOT user consent to default the entire Step 2 questionnaire.
It is a per-question default only. Add guidance at both instruction sites
(SKILL.md User Questions section, references/workflow.md Step 2 header)
telling the agent to:
1. Continue asking the remaining questions in the sequence after a
timeout — each question is an independent consent point.
2. Surface every defaulted choice in the next user-visible message
so the user can correct it when they return. An unreported default
is indistinguishable from never having asked.
Reported live Apr 2026: agent asked style question via clarify, got a
timeout response, and silently defaulted style + narrative focus +
audience + review flags in one pass. User only learned style had
defaulted to 'ohmsha' after the comic was fully generated.
* fix(prompt): tell CLI agents not to emit MEDIA:/path tags (#13766)
The CLI has no attachment channel — MEDIA:<path> tags are only
intercepted on messaging gateway platforms (Telegram, Discord,
Slack, WhatsApp, Signal, BlueBubbles, email, etc.). On the CLI
they render as literal text, which is confusing for users.
The CLI platform hint was the one PLATFORM_HINTS entry that said
nothing about file delivery, so models trained on the messaging
hints would default to MEDIA: tags on the CLI too. Tool schemas
(browser_tool, tts_tool, etc.) also recommend MEDIA: generically.
Extend the CLI hint to explicitly discourage MEDIA: tags and tell
the agent to reference files by plain absolute path instead.
Add a regression test asserting the CLI hint carries negative
guidance about MEDIA: while messaging hints keep positive guidance.
* fix: add User-Agent claude-code/0.1.0 for Kimi /coding endpoint
- Add _is_kimi_coding_endpoint() to detect Kimi coding API
- Place Kimi check BEFORE _requires_bearer_auth to ensure User-Agent header is set
- Without this header, Kimi returns 403 on /coding/v1/messages
- Fixes kimi-2.5, kimi-for-coding, kimi-k2.6-code-preview all returning 403
* fix: auto-detect anthropic_messages mode for Kimi /coding/v1 endpoints
* fix(kimi-coding): add KIMI_CODING_API_KEY fallback + api_mode detection for /coding endpoint
* fix(kimi-coding): set anthropic_messages api_mode for /coding endpoint
* fix: Update Kimi Coding API endpoint and User-Agent
* fix: Enhance Kimi Coding API mode detection and User-Agent
* fix(kimi): reconcile sk-kimi- routing with Anthropic SDK URL semantics
Follow-ups after salvaging xiaoqiang243's kimi-for-coding patches:
- KIMI_CODE_BASE_URL: drop trailing /v1 (was /coding/v1).
The /coding endpoint speaks Anthropic Messages, and the Anthropic SDK
appends /v1/messages internally. /coding/v1 + SDK suffix produced
/coding/v1/v1/messages (a 404). /coding + SDK suffix now yields
/coding/v1/messages correctly.
- kimi-coding ProviderConfig: keep legacy default api.moonshot.ai/v1 so
non-sk-kimi- moonshot keys still authenticate. sk-kimi- keys are
already redirected to api.kimi.com/coding via _resolve_kimi_base_url.
- doctor.py: update Kimi UA to claude-code/0.1.0 (was KimiCLI/1.30.0)
and rewrite /coding base URLs to /coding/v1 for the /models health
check (Anthropic surface has no /models).
- test_kimi_env_vars: accept KIMI_CODING_API_KEY as a secondary env var.
E2E verified:
sk-kimi-<key> → https://api.kimi.com/coding/v1/messages (Anthropic)
sk-<legacy> → https://api.moonshot.ai/v1/chat/completions (OpenAI)
UA: claude-code/0.1.0, x-api-key: <sk-kimi-*>
* chore(release): map xiaoqiang243 personal email in AUTHOR_MAP
* feat: add ResponsesApiTransport + wire all Codex transport paths
Add ResponsesApiTransport wrapping codex_responses_adapter.py behind the
ProviderTransport ABC. Auto-registered via _discover_transports().
Wire ALL Codex transport methods to production paths in run_agent.py:
- build_kwargs: main _build_api_kwargs codex branch (50 lines extracted)
- normalize_response: main loop + flush + summary + retry (4 sites)
- convert_tools: memory flush tool override
- convert_messages: called internally via build_kwargs
- validate_response: response validation gate
- preflight_kwargs: request sanitization (2 sites)
Remove 7 dead legacy wrappers from AIAgent (_responses_tools,
_chat_messages_to_responses_input, _normalize_codex_response,
_preflight_codex_api_kwargs, _preflight_codex_input_items,
_extract_responses_message_text, _extract_responses_reasoning_text).
Keep 3 ID manipulation methods still used by _build_assistant_message.
Update 18 test call sites across 3 test files to call adapter functions
directly instead of through deleted AIAgent wrappers.
24 new tests. 343 codex/responses/transport tests pass (0 failures).
PR 4 of the provider transport refactor.
* fix(delegation): add hard timeout and stale detection for subagent execution (#13770)
- Wrap child.run_conversation() in a ThreadPoolExecutor with configurable
timeout (delegation.child_timeout_seconds, default 300s) to prevent
indefinite blocking when a subagent's API call or tool HTTP request hangs.
- Add heartbeat stale detection: if a child's api_call_count doesn't
advance for 5 consecutive heartbeat cycles (~2.5 min), stop touching
the parent's activity timestamp so the gateway inactivity timeout
can fire as a last resort.
- Add 'timeout' as a new exit_reason/status alongside the existing
completed/max_iterations/interrupted states.
- Use shutdown(wait=False) on the timeout executor to avoid the
ThreadPoolExecutor.__exit__ deadlock when a child is stuck on
blocking I/O.
Closes #13768
* remove Nous Portal free-model allowlist
Drop _NOUS_ALLOWED_FREE_MODELS + filter_nous_free_models and its two call
sites. Whatever Nous Portal prices as free now shows up in the picker as-is
— no local allowlist gatekeeping. Free-tier partitioning (paid vs free in
the menu) still runs via partition_nous_models_by_tier.
* feat(aux): use Portal /api/nous/recommended-models for auxiliary models
Wire the auxiliary client (compaction, vision, session search, web extract)
to the Nous Portal's curated recommended-models endpoint when running on
Nous Portal, with a TTL-cached fetch that mirrors how we pull /models for
pricing.
hermes_cli/models.py
- fetch_nous_recommended_models(portal_base_url, force_refresh=False)
10-minute TTL cache, keyed per portal URL (staging vs prod don't
collide). Public endpoint, no auth required. Returns {} on any
failure so callers always get a dict.
- get_nous_recommended_aux_model(vision, free_tier=None, ...)
Tier-aware pick from the payload:
- Paid tier → paidRecommended{Vision,Compaction}Model, falling back
to freeRecommended* when the paid field is null (common during
staged rollouts of new paid models).
- Free tier → freeRecommended* only, never leaks paid models.
When free_tier is None, auto-detects via the existing
check_nous_free_tier() helper (already cached 3 min against
/api/oauth/account). Detection errors default to paid so we never
silently downgrade a paying user.
agent/auxiliary_client.py — _try_nous()
- Replaces the hardcoded xiaomi/mimo free-tier branch with a single call
to get_nous_recommended_aux_model(vision=vision).
- Falls back to _NOUS_MODEL (google/gemini-3-flash-preview) when the
Portal is unreachable or returns a null recommendation.
- The Portal is now the source of truth for aux model selection; the
xiaomi allowlist we used to carry is effectively dead.
Tests (15 new)
- tests/hermes_cli/test_models.py::TestNousRecommendedModels
Fetch caching, per-portal keying, network failure, force_refresh;
paid-prefers-paid, paid-falls-to-free, free-never-leaks-paid,
auto-detect, detection-error → paid default, null/blank modelName
handling.
- tests/agent/test_auxiliary_client.py::TestNousAuxiliaryRefresh
_try_nous honors Portal recommendation for text + vision, falls
back to google/gemini-3-flash-preview on None or exception.
Behavior won't visibly change today — both tier recommendations currently
point at google/gemini-3-flash-preview — but the moment the Portal ships
a better paid recommendation, subscribers pick it up within 10 minutes
without a Hermes release.
* feat: add ChatCompletionsTransport + wire all default paths
Third concrete transport — handles the default 'chat_completions' api_mode used
by ~16 OpenAI-compatible providers (OpenRouter, Nous, NVIDIA, Qwen, Ollama,
DeepSeek, xAI, Kimi, custom, etc.). Wires build_kwargs + validate_response to
production paths.
Based on PR #13447 by @kshitijk4poor, with fixes:
- Preserve tool_call.extra_content (Gemini thought_signature) via
ToolCall.provider_data — the original shim stripped it, causing 400 errors
on multi-turn Gemini 3 thinking requests.
- Preserve reasoning_content distinctly from reasoning (DeepSeek/Moonshot) so
the thinking-prefill retry check (_has_structured) still triggers.
- Port Kimi/Moonshot quirks (32000 max_tokens, top-level reasoning_effort,
extra_body.thinking) that landed on main after the original PR was opened.
- Keep _qwen_prepare_chat_messages_inplace alive and call it through the
transport when sanitization already deepcopied (avoids a second deepcopy).
- Skip the back-compat SimpleNamespace shim in the main normalize loop — for
chat_completions, response.choices[0].message is already the right shape
with .content/.tool_calls/.reasoning/.reasoning_content/.reasoning_details
and per-tool-call .extra_content from the OpenAI SDK.
run_agent.py: -239 lines in _build_api_kwargs default branch extracted to the
transport. build_kwargs now owns: codex-field sanitization, Qwen portal prep,
developer role swap, provider preferences, max_tokens resolution (ephemeral >
user > NVIDIA 16384 > Qwen 65536 > Kimi 32000 > anthropic_max_output), Kimi
reasoning_effort + extra_body.thinking, OpenRouter/Nous/GitHub reasoning,
Nous product attribution tags, Ollama num_ctx, custom-provider think=false,
Qwen vl_high_resolution_images, request_overrides.
39 new transport tests (8 build_kwargs, 5 Kimi, 4 validate, 4 normalize
including extra_content regression, 3 cache stats, 3 basic). Tests/run_agent/
targeted suite passes (885/885 + 15 skipped; the 1 remaining failure is the
test_concurrent_interrupt flake present on origin/main).
* fix(tui): don't force-open Activity on every error
Reverts the auto-expand-on-new-error effect added in 93b47d96. The
effect overrode the user's chosen detailsMode and visually interrupted
every turn. Red/yellow chevron tint remains as the passive signal —
click to read, just like Thinking and Tool calls.
* fix(tui): demote gateway log-noise from Activity to info tone
Restore the old-CLI contract where only complete failures tint Activity
red. Everything else is still visible for debugging but no longer
commandeers attention.
- gateway.stderr: always tone='info' (drops the ERRLIKE_RE regex)
- gateway.protocol_error: both pushes demoted to 'info'
- commands.catalog cold-start failure: demoted to 'info'
- approval.request: no longer duplicates the overlay into Activity
Kept as 'error': terminal `error` event, gateway.start_timeout,
gateway-exited, explicit status.update kinds.
* feat: add BedrockTransport + wire all Bedrock transport paths
Fourth and final transport — completes the transport layer with all four
api_modes covered. Wraps agent/bedrock_adapter.py behind the ProviderTransport
ABC, handles both raw boto3 dicts and already-normalized SimpleNamespace.
Wires all transport methods to production paths in run_agent.py:
- build_kwargs: _build_api_kwargs bedrock branch
- validate_response: response validation, new bedrock_converse branch
- finish_reason: new bedrock_converse branch in finish_reason extraction
Based on PR #13467 by @kshitijk4poor, with one adjustment: the main normalize
loop does NOT add a bedrock_converse branch to invoke normalize_response on
the already-normalized response. Bedrock's normalize_converse_response runs
at the dispatch site (run_agent.py:5189), so the response already has the
OpenAI-compatible .choices[0].message shape by the time the main loop sees
it. Falling through to the chat_completions else branch is correct and
sidesteps a redundant NormalizedResponse rebuild.
Transport coverage — complete:
| api_mode | Transport | build_kwargs | normalize | validate |
|--------------------|--------------------------|:------------:|:---------:|:--------:|
| anthropic_messages | AnthropicTransport | ✅ | ✅ | ✅ |
| codex_responses | ResponsesApiTransport | ✅ | ✅ | ✅ |
| chat_completions | ChatCompletionsTransport | ✅ | ✅ | ✅ |
| bedrock_converse | BedrockTransport | ✅ | ✅ | ✅ |
17 new BedrockTransport tests pass. 117 transport tests total pass.
160 bedrock/converse tests across tests/agent/ pass. Full tests/run_agent/
targeted suite passes (885/885 + 15 skipped; the 1 remaining failure is the
pre-existing test_concurrent_interrupt flake on origin/main).
* chore(models): drop 3 models from nous portal recommended list (#13822)
Remove nvidia/nemotron-3-super-120b-a12b:free, arcee-ai/trinity-large-preview:free,
and openrouter/elephant-alpha from _PROVIDER_MODELS['nous']. The paid nemotron and
arcee-thinking variants remain.
* fix(kimi): don't send Anthropic thinking to api.kimi.com/coding (#13826)
Kimi's /coding endpoint speaks the Anthropic Messages protocol but has
its own thinking semantics: when thinking.enabled is sent, Kimi validates
the history and requires every prior assistant tool-call message to carry
OpenAI-style reasoning_content. The Anthropic path never populates that
field, and convert_messages_to_anthropic strips Anthropic thinking blocks
on third-party endpoints — so after one tool-calling turn the next request
fails with:
HTTP 400: thinking is enabled but reasoning_content is missing in
assistant tool call message at index N
Kimi on chat_completions handles thinking via extra_body in
ChatCompletionsTransport (#13503). On the Anthropic route, drop the
parameter entirely and let Kimi drive reasoning server-side.
build_anthropic_kwargs now gates the reasoning_config -> thinking block
on not _is_kimi_coding_endpoint(base_url).
Tests: 8 new parametric tests cover /coding, /coding/v1, /coding/anthropic,
/coding/ (trailing slash), explicit disabled, other third-party endpoints
still getting thinking (MiniMax), native Anthropic unaffected, and the
non-/coding Kimi root route.
* feat(models): add minimax/minimax-m2.5:free to OpenRouter catalog (#13836)
Surfaces the free variant alongside the paid minimax-m2.5 entry in
both the OPENROUTER_MODELS fallback snapshot and the nous/openrouter
provider model list.
* feat(plugins): pluggable image_gen backends + OpenAI provider (#13799)
* feat(plugins): pluggable image_gen backends + OpenAI provider
Adds a ImageGenProvider ABC so image generation backends register as
bundled plugins under `plugins/image_gen/<name>/`. The plugin scanner
gains three primitives to make this work generically:
- `kind:` manifest field (`standalone` | `backend` | `exclusive`).
Bundled `kind: backend` plugins auto-load — no `plugins.enabled`
incantation. User-installed backends stay opt-in.
- Path-derived keys: `plugins/image_gen/openai/` gets key
`image_gen/openai`, so a future `tts/openai` cannot collide.
- Depth-2 recursion into category namespaces (parent dirs without a
`plugin.yaml` of their own).
Includes `OpenAIImageGenProvider` as the first consumer (gpt-image-1.5
default, plus gpt-image-1, gpt-image-1-mini, DALL-E 3/2). Base64
responses save to `$HERMES_HOME/cache/images/`; URL responses pass
through.
FAL stays in-tree for this PR — a follow-up ports it into
`plugins/image_gen/fal/` so the in-tree `image_generation_tool.py`
slims down. The dispatch shim in `_handle_image_generate` only fires
when `image_gen.provider` is explicitly set to a non-FAL value, so
existing FAL setups are untouched.
- 41 unit tests (scanner recursion, kind parsing, gate logic,
registry, OpenAI payload shapes)
- E2E smoke verified: bundled plugin autoloads, registers, and
`_handle_image_generate` routes to OpenAI when configured
* fix(image_gen/openai): don't send response_format to gpt-image-*
The live API rejects it: 'Unknown parameter: response_format'
(verified 2026-04-21 with gpt-image-1.5). gpt-image-* models return
b64_json unconditionally, so the parameter was both unnecessary and
actively broken.
* feat(image_gen/openai): gpt-image-2 only, drop legacy catalog
gpt-image-2 is the latest/best OpenAI image model (released 2026-04-21)
and there's no reason to expose the older gpt-image-1.5 / gpt-image-1 /
dall-e-3 / dall-e-2 alongside it — slower, lower quality, or awkward
(dall-e-2 squares only). Trim the catalog down to a single model.
Live-verified end-to-end: landscape 1536x1024 render of a Moog-style
synth matches prompt exactly, 2.4MB PNG saved to cache.
* feat(image_gen/openai): expose gpt-image-2 as three quality tiers
Users pick speed/fidelity via the normal model picker instead of a
hidden quality knob. All three tier IDs resolve to the single underlying
gpt-image-2 API model with a different quality parameter:
gpt-image-2-low ~15s fast iteration
gpt-image-2-medium ~40s default
gpt-image-2-high ~2min highest fidelity
Live-measured on OpenAI's API today: 15.4s / 40.8s / 116.9s for the
same 1024x1024 prompt.
Config:
image_gen.openai.model: gpt-image-2-high
# or
image_gen.model: gpt-image-2-low
# or env var for scripts/tests
OPENAI_IMAGE_MODEL=gpt-image-2-medium
Live-verified end-to-end with the low tier: 18.8s landscape render of a
golden retriever in wildflowers, vision-confirmed exact match.
* feat(tools_config): plugin image_gen providers inject themselves into picker
'hermes tools' → Image Generation now shows plugin-registered backends
alongside Nous Subscription and FAL.ai without tools_config.py needing
to know about them. OpenAI appears as a third option today; future
backends appear automatically as they're added.
Mechanism:
- ImageGenProvider gains an optional get_setup_schema() hook
(name, badge, tag, env_vars). Default derived from display_name.
- tools_config._plugin_image_gen_providers() pulls the schemas from
every registered non-FAL plugin provider.
- _visible_providers() appends those rows when rendering the Image
Generation category.
- _configure_provider() handles the new image_gen_plugin_name marker:
writes image_gen.provider and routes to the plugin's list_models()
catalog for the model picker.
- _toolset_needs_configuration_prompt('image_gen') stops demanding a
FAL key when any plugin provider reports is_available().
FAL is skipped in the plugin path because it already has hardcoded
TOOL_CATEGORIES rows — when it gets ported to a plugin in a follow-up
PR the hardcoded rows go away and it surfaces through the same path
as OpenAI.
Verified live: picker shows Nous Subscription / FAL.ai / OpenAI.
Picking OpenAI prompts for OPENAI_API_KEY, then shows the
gpt-image-2-low/medium/high model picker sourced from the plugin.
397 tests pass across plugins/, tools_config, registry, and picker.
* fix(image_gen): close final gaps for plugin-backend parity with FAL
Two small places that still hardcoded FAL:
- hermes_cli/setup.py status line: an OpenAI-only setup showed
'Image Generation: missing FAL_KEY'. Now probes plugin providers
and reports '(OpenAI)' when one is_available() — or falls back to
'missing FAL_KEY or OPENAI_API_KEY' if nothing is configured.
- image_generate tool schema description: said 'using FAL.ai, default
FLUX 2 Klein 9B'. Rewrote provider-neutral — 'backend and model are
user-configured' — and notes the 'image' field can be a URL or an
absolute path, which the gateway delivers either way via
extract_local_files().
* feat: add Step Plan provider support (salvage #6005)
Adds a first-class 'stepfun' API-key provider surfaced as Step Plan:
- Support Step Plan setup for both International and China regions
- Discover Step Plan models live from /step_plan/v1/models, with a
small coding-focused fallback catalog when discovery is unavailable
- Thread StepFun through provider metadata, setup persistence, status
and doctor output, auxiliary routing, and model normalization
- Add tests for provider resolution, model validation, metadata
mapping, and StepFun region/model persistence
Based on #6005 by @hengm3467.
Co-authored-by: hengm3467 <100685635+hengm3467@users.noreply.github.com>
* fix(packaging): include agent.* sub-packages in pyproject.toml
The transport refactor (PRs #13862 ff.) added agent/transports/ as a
sub-package but the setuptools packages.find include list only had
"agent" (top-level files), not "agent.*" (sub-packages).
pip install / Nix builds therefore ship run_agent.py (which now imports
from agent.transports on every API call) but omit the transports
directory entirely, causing:
ModuleNotFoundError: No module named 'agent.transports'
on every LLM call for packaged installs.
Adds "agent.*" to match the existing pattern used by tools, gateway,
tui_gateway, and plugins.
* fix: preserve reasoning_content on Kimi replay
* feat(optional-skills): add page-agent skill under new web-development category (#13976)
Adds an optional skill that walks users through installing and using
alibaba/page-agent — a pure-JS in-page GUI agent that web developers
embed into their own webapps so end users can drive the UI with
natural language.
Three install paths: CDN demo (30s, no install), npm install into an
existing app with provider config table (Qwen/OpenAI/Ollama/OpenRouter),
and clone-from-source for dev/contributor workflow.
Clear use-case framing up front (embed AI copilot in SaaS/admin/B2B,
modernize legacy UIs, accessibility via natural language) and an
explicit NOT-for list that points users wanting server-side browser
automation back to Hermes' built-in browser tool.
Live-verified: repo builds on Node 22.22 + npm 10.9, dev:demo serves
at localhost:5174, API surface (new PageAgent{...}, panel.show(),
execute(task)) matches what the skill documents. Also verified
discovery end-to-end via OptionalSkillSource with isolated
HERMES_HOME — search/inspect/fetch all resolve
official/web-development/page-agent correctly.
New category directory: optional-skills/web-development/ with a
DESCRIPTION.md explaining the distinction from Hermes' own browser
automation (outside-in vs inside-out).
* feat(wecom): add QR scan flow and interactive setup wizard for bot credentials
* docs(wecom): document QR scan-to-create setup flow
* fix(wecom): visible poll progress + clearer no-bot-info failure + docstring note
Follow-ups on top of salvaged #13923 (@keifergu):
- Print QR poll dot every 3s instead of every 18s so "Fetching
configuration results..." doesn't look hung.
- On "status=success but no bot_info" from the WeCom query endpoint,
log the full payload at WARNING and tell the user we're falling
back to manual entry (was previously a single opaque line).
- Document in the qr_scan_for_bot_info() docstring that the
work.weixin.qq.com/ai/qc/* endpoints are the admin-console web-UI
flow, not the public developer API, and may change without notice.
Also add keifergu@tencent.com to scripts/release.py AUTHOR_MAP so
release notes attribute the feature correctly.
* feat(state): auto-prune old sessions + VACUUM state.db at startup (#13861)
* feat(state): auto-prune old sessions + VACUUM state.db at startup
state.db accumulates every session, message, and FTS5 index entry forever.
A heavy user (gateway + cron) reported 384MB with 982 sessions / 68K messages
causing slowdown; manual 'hermes sessions prune --older-than 7' + VACUUM
brought it to 43MB. The prune command and VACUUM are not wired to run
automatically anywhere — sessions grew unbounded until users noticed.
Changes:
- hermes_state.py: new state_meta key/value table, vacuum() method, and
maybe_auto_prune_and_vacuum() — idempotent via last-run timestamp in
state_meta so it only actually executes once per min_interval_hours
across all Hermes processes for a given HERMES_HOME. Never raises.
- hermes_cli/config.py: new 'sessions:' block in DEFAULT_CONFIG
(auto_prune=True, retention_days=90, vacuum_after_prune=True,
min_interval_hours=24). Added to _KNOWN_ROOT_KEYS.
- cli.py: call maintenance once at HermesCLI init (shared helper
_run_state_db_auto_maintenance reads config and delegates to DB).
- gateway/run.py: call maintenance once at GatewayRunner init.
- Docs: user-guide/sessions.md rewrites 'Automatic Cleanup' section.
Why VACUUM matters: SQLite does NOT shrink the file on DELETE — freed
pages get reused on next INSERT. Without VACUUM, a delete-heavy DB stays
bloated forever. VACUUM only runs when the prune actually removed rows,
so tight DBs don't pay the I/O cost.
Tests: 10 new tests in tests/test_hermes_state.py covering state_meta,
vacuum, idempotency, interval skipping, VACUUM-only-when-needed,
corrupt-marker recovery. All 246 existing state/config/gateway tests
still pass.
Verified E2E with real imports + isolated HERMES_HOME: DEFAULT_CONFIG
exposes the new block, load_config() returns it for fresh installs,
first call prunes+vacuums, second call within min_interval_hours skips,
and the state_meta marker persists across connection close/reopen.
* sessions.auto_prune defaults to false (opt-in)
Session history powers session_search recall across past conversations,
so silently pruning on startup could surprise users. Ship the machinery
disabled and let users opt in when they notice state.db is hurting
performance.
- DEFAULT_CONFIG.sessions.auto_prune: True → False
- Call-site fallbacks in cli.py and gateway/run.py match the new default
(so unmigrated configs still see off)
- Docs: flip 'Enable in config.yaml' framing + tip explains the tradeoff
* feat(hindsight): richer session-scoped retain metadata
- Add configurable retain_tags / retain_source / retain_user_prefix /
retain_assistant_prefix knobs for native Hindsight.
- Thread gateway session identity (user_name, chat_id, chat_name,
chat_type, thread_id) through AIAgent and MemoryManager into
MemoryProvider.initialize kwargs so providers can scope and tag
retained memories.
- Hindsight attaches the new identity fields as retain metadata,
merges per-call tool tags with configured default tags, and uses
the configurable transcript labels for auto-retained turns.
Co-authored-by: Abner <abner.the.foreman@agentmail.to>
* chore(release): map Abner email to Abnertheforeman
* refactor(qqbot): migrate qr onboard flow to sync + consolidate into onboard.py
- Replace async create_bind_task/poll_bind_result with synchronous
httpx.Client equivalents, eliminating manual event loop management
- Move _render_qr and full qr_register() entry-point into onboard.py,
mirroring the Feishu onboarding pattern
- Remove _qqbot_render_qr and _qqbot_qr_flow from gateway.py (~90 lines);
call site becomes a single qr_register() import
- Fix potential segfault: previous code called loop.close() in the EXPIRED
branch and again in the finally block (double-close crashed under uvloop)
* fix(cli): ensure project .env is sanitized before loading
* chore(release): map hharry11 email to GitHub handle
* feat(dashboard): track real API call count per session
Adds schema v7 'api_call_count' column. run_agent.py increments it by 1
per LLM API call, web_server analytics SQL aggregates it, frontend uses
the real counter instead of summing sessions.
The 'API Calls' card on the analytics dashboard previously displayed
COUNT(*) from the sessions table — the number of conversations, not
LLM requests. Each session makes 10-90 API calls through the tool loop,
so the reported number was ~30x lower than real.
Salvaged from PR #10140 (@kshitijk4poor). The cache-token accuracy
portions of the original PR were deferred — per-provider analytics is
the better path there, since cache_write_tokens and actual_cost_usd
are only reliably available from a subset of providers (Anthropic
native, Codex Responses, OpenRouter with usage.include).
Tests:
- schema_version v7 assertion
- migration v2 -> v7 adds api_call_count column with default 0
- update_token_counts increments api_call_count by provided delta
- absolute=True sets api_call_count directly
- /api/analytics/usage exposes total_api_calls in totals
* fix(plugins+nous): auto-coerce memory plugins; actionable Nous 401 diagnostic (#14005)
* fix(plugins): auto-coerce user-installed memory plugins to kind=exclusive
User-installed memory provider plugins…
21 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR 5 of the provider transport refactor (PR 1: #12975, PR 2: #13073, PR 3: #13366, PR 4: #13430).
Third concrete transport — handles the default
chat_completionsapi_mode used by ~16 OpenAI-compatible providers. Extracts the 210-line kwargs construction block from run_agent.py with all 13 provider-specific conditionals.What ships
agent/transports/chat_completions.py— ChatCompletionsTransport (314 lines)All transport methods wired to production paths:
build_kwargs()elsebranch in_build_api_kwargswith 13 provider conditionalsvalidate_response()response.choicesvalidation gateextract_cache_stats()prompt_tokens_details.cached_tokensextractionconvert_messages()convert_tools()normalize_response()response.choices[0].message→ NormalizedResponse)Provider-specific conditionals now in the transport:
Impact
_build_api_kwargs(12,054 → 11,948)_get_chat_completions_transport()lazy singleton addedTest plan