feat: add invocation benchmark package by fahreddinozcan · Pull Request #2366 · upstash/context7

fahreddinozcan · 2026-03-31T08:03:44Z

Summary

Adds packages/benchmark -- a TypeScript port of the Python orchestrator eval script as a proper monorepo package
Benchmarks trigger accuracy (recall, precision, false positives) across MCP and CLI integration modes
Supports prod vs dev comparison: prod reads rules/skills from master, dev reads from working tree and uses local MCP build
Runs via pnpm bench from repo root

Modes

Mode	MCP server	Rule source	Skill source
`mcp:prod`	npm latest	master	-
`mcp:dev`	local build	working tree	-
`cli:prod`	-	master	master
`cli:dev`	-	working tree	working tree

Ad-hoc modes also available: mcp, mcp+rule, mcp+claude.md, cli+skill, cli+rule, cli+claude.md

TypeScript port of the orchestrator eval script as a proper monorepo package. Benchmarks trigger accuracy across MCP and CLI integration modes with prod/dev comparison support (prod reads from master, dev reads from working tree).

linear · 2026-03-31T08:03:48Z

CTX7-1459 invocation benchmark library

- Add nia:prod and vs:mcp modes for competitive benchmarking - Add versus detection to track which provider Claude picks - Move trigger-eval.json into benchmark package, gitignore results - Remove context7-mcp skill during cleanup to prevent test leaks - Fix nia MCP registration (arg ordering for variadic --header) - Add 15 new eval queries: deep-dive, source-code, research, github-search, niche-lib categories (75 total) - Show provider breakdown in final comparison for versus modes

enesgules · 2026-03-31T12:07:10Z

packages/benchmark/src/detection.ts

+const NIA_TOOLS = new Set([
+  "search_documentation",
+  "search_codebase",
+  "index",
+  "regex_search",
+  "manage_resource",
+  "get_github_file_tree",
+  "nia_web_search",
+  "nia_deep_research_agent",
+  "read_source_content",
+]);
+
+function isContext7Tool(name: string): boolean {
+  return (
+    name.includes("resolve-library-id") ||
+    name.includes("resolve_library_id") ||
+    name.includes("query-docs") ||
+    name.includes("query_docs")
+  );
+}


we have to update these lists when there are new tools or tool names change? :'( also i remember nia having more tools, is this list accurate

- Add vs:cli mode: Context7 CLI prod setup + Nia skill side by side - Pull Nia skill from nozomio-labs/nia-skill repo (cached locally) - Detect Nia skill triggers via Skill(nia) and Bash(nia) calls - Fix versus detection to recognize CLI/skill triggers, not just MCP

Use MCP server namespace prefixes (mcp__context7__, mcp__nia__) and naming conventions (nia_*) instead of maintaining a hardcoded set of tool names that breaks when tools are added or renamed.

feat: add invocation benchmark package

c13d581

TypeScript port of the orchestrator eval script as a proper monorepo package. Benchmarks trigger accuracy across MCP and CLI integration modes with prod/dev comparison support (prod reads from master, dev reads from working tree).

fahreddinozcan added 2 commits March 31, 2026 11:11

feat: add eval dataset and point skill source to working tree

9a45f2f

enesgules reviewed Mar 31, 2026

View reviewed changes

enesgules approved these changes Mar 31, 2026

View reviewed changes

fahreddinozcan added 2 commits March 31, 2026 15:09

fix: replace hardcoded tool lists with namespace prefix detection

91f9d94

Use MCP server namespace prefixes (mcp__context7__, mcp__nia__) and naming conventions (nia_*) instead of maintaining a hardcoded set of tool names that breaks when tools are added or renamed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add invocation benchmark package#2366

feat: add invocation benchmark package#2366
fahreddinozcan wants to merge 5 commits intomasterfrom
ctx7-1459-invocation-benchmark-library

fahreddinozcan commented Mar 31, 2026

Uh oh!

linear bot commented Mar 31, 2026

Uh oh!

enesgules Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fahreddinozcan commented Mar 31, 2026

Summary

Modes

Uh oh!

linear bot commented Mar 31, 2026

Uh oh!

enesgules Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants