Skip to content

feat: add invocation benchmark package#2366

Open
fahreddinozcan wants to merge 5 commits intomasterfrom
ctx7-1459-invocation-benchmark-library
Open

feat: add invocation benchmark package#2366
fahreddinozcan wants to merge 5 commits intomasterfrom
ctx7-1459-invocation-benchmark-library

Conversation

@fahreddinozcan
Copy link
Copy Markdown
Contributor

Summary

  • Adds packages/benchmark -- a TypeScript port of the Python orchestrator eval script as a proper monorepo package
  • Benchmarks trigger accuracy (recall, precision, false positives) across MCP and CLI integration modes
  • Supports prod vs dev comparison: prod reads rules/skills from master, dev reads from working tree and uses local MCP build
  • Runs via pnpm bench from repo root

Modes

Mode MCP server Rule source Skill source
mcp:prod npm latest master -
mcp:dev local build working tree -
cli:prod - master master
cli:dev - working tree working tree

Ad-hoc modes also available: mcp, mcp+rule, mcp+claude.md, cli+skill, cli+rule, cli+claude.md

TypeScript port of the orchestrator eval script as a proper monorepo
package. Benchmarks trigger accuracy across MCP and CLI integration
modes with prod/dev comparison support (prod reads from master, dev
reads from working tree).
@linear
Copy link
Copy Markdown

linear bot commented Mar 31, 2026

- Add nia:prod and vs:mcp modes for competitive benchmarking
- Add versus detection to track which provider Claude picks
- Move trigger-eval.json into benchmark package, gitignore results
- Remove context7-mcp skill during cleanup to prevent test leaks
- Fix nia MCP registration (arg ordering for variadic --header)
- Add 15 new eval queries: deep-dive, source-code, research,
  github-search, niche-lib categories (75 total)
- Show provider breakdown in final comparison for versus modes
Comment on lines +16 to +35
const NIA_TOOLS = new Set([
"search_documentation",
"search_codebase",
"index",
"regex_search",
"manage_resource",
"get_github_file_tree",
"nia_web_search",
"nia_deep_research_agent",
"read_source_content",
]);

function isContext7Tool(name: string): boolean {
return (
name.includes("resolve-library-id") ||
name.includes("resolve_library_id") ||
name.includes("query-docs") ||
name.includes("query_docs")
);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have to update these lists when there are new tools or tool names change? :'( also i remember nia having more tools, is this list accurate

- Add vs:cli mode: Context7 CLI prod setup + Nia skill side by side
- Pull Nia skill from nozomio-labs/nia-skill repo (cached locally)
- Detect Nia skill triggers via Skill(nia) and Bash(nia) calls
- Fix versus detection to recognize CLI/skill triggers, not just MCP
Use MCP server namespace prefixes (mcp__context7__, mcp__nia__) and
naming conventions (nia_*) instead of maintaining a hardcoded set of
tool names that breaks when tools are added or renamed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants