Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .agents/skills/mcp-attribution-worktree

This file was deleted.

139 changes: 139 additions & 0 deletions .agents/skills/mcp-attribution-worktree/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
---
name: mcp-attribution-worktree
description: Triage, repair, and close MCP attribution issues from the local report API with evidence-driven decisions and isolated Worktrunk worktrees. Use this skill when Codex needs to process `tool` attribution issues and skills-related attribution issues, inspect related runs, decide whether the issue is actionable in `mcp/src` or `config/source/skills`, update attribution fields as `owner=codex`, and then complete the fix loop through GitHub issue tracking, worktree-based code changes, PR submission, and follow-up iteration when the problem is repairable.
---

# MCP Attribution Worktree

Process MCP and skills-related attribution issues as an auditable maintenance workflow instead of ad-hoc debugging.

## What this skill does

Use this skill to:

- fetch pending MCP and skills-related attribution issues from the local report API
- inspect issue detail plus representative runs before making any status decision
- map failures back to concrete `mcp/src` tools, `config/source/skills`, or classify them as environment / grader / duplicate noise
- update attribution issues with concise evidence, `owner=codex`, and links to external GitHub work
- isolate each actionable repair in its own Worktrunk worktree and branch
- carry actionable issues through repo repair and PR creation instead of stopping at issue state updates
- continue from existing GitHub issues or PRs when later review or evaluation feedback shows the first direction was incomplete or wrong
- run a real post-PR evaluation when an evaluation interface is available, and use that result to decide whether another repair loop is needed

## Do not use this skill for

- generic bug fixing without attribution evidence
- unrelated attribution categories that do not map to `mcp/src` or `config/source/skills`
- bulk repo changes unrelated to a specific attribution issue
- direct database or backend mutation outside the documented report API endpoints

## Workflow

1. Start with focused `tool` and `skill` backlog queries.
2. Process one issue at a time. Never mix evidence, notes, or worktrees across issues.
3. Run the existing-artifact preflight before choosing the representative run: read issue detail, current notes, existing `externalUrl`, and the state of any linked GitHub issue or PR. If a GitHub issue or PR already exists, treat it as part of the current state, not a finished endpoint.
4. Read at least one run's `result` and `trace`. Prefer to also read `evaluation-trace`.
5. Check the relevant implementation in `mcp/src` or `config/source/skills` before deciding whether the issue is actionable.
6. If the issue is actionable in repo code or skills content, do not stop at attribution triage. Open or link the matching GitHub issue, create a dedicated Worktrunk worktree, implement the fix, validate it, and prepare a PR.
7. If review comments, review decisions, or later evidence show the direction is wrong, start another focused iteration from the existing GitHub issue or PR context and continue improving instead of treating the first PR as final.
8. Update attribution fields through the report API after you have the right evidence, and update them again when the GitHub issue, PR, or evaluation result becomes available.
9. Before changing an attribution to `resolved`, run a closure preflight on the linked GitHub artifact again: reread the latest PR comments, review comments, review decisions, and issue comments after the most recent code push or evaluation result.
10. When a real evaluation interface exists, run a post-PR evaluation and use the result plus the closure preflight to decide whether to continue iterating or mark the issue closed.
11. Only stop after the issue is either clearly non-actionable or has been carried through the repair loop as far as the current environment allows.

## Common requests

- "Automatically process the pending MCP attribution issues."
- "Look at the tool and skills attribution backlog, fix the real issues, and update attribution with evidence."
- "Find valuable MCP attribution problems and fix them in isolated worktrees."
- "For each tool issue, decide whether it is a real `mcp/src` bug or just evaluation noise."
- "Continue iterating on the existing issue or PR after review comments."
- "After opening the PR, run a real evaluation and fix the next round if it still fails."

## Routing

| Task | Read |
| --- | --- |
| Run the report API triage flow and update attribution fields across tool and skills-related issues | `references/report-api-workflow.md` |
| Decide whether an issue is valuable and map it to `mcp/src` or `config/source/skills` | `references/value-triage.md` |
| Create GitHub issues, use Worktrunk, and repair the repo in isolation | `references/worktree-repair.md` |
| Continue from review feedback or real evaluation results after a PR already exists | `references/iteration-loop.md` |
| Trigger real evaluation runs and interpret the result | `references/evaluation-verification.md` |
| Dispatch one issue per worker and enforce closure-sweep rules in sub-agent prompts | `references/subagent-orchestration.md` |

## Operating rules

- Only update attribution issues through the local report API.
- Treat `owner` as fixed: always set it to `codex` when you patch an attribution.
- Before any new iteration, always inspect the latest attribution `notes`, `externalUrl`, linked GitHub issue or PR status, and any available PR comments or review decisions.
- Before moving any issue to `resolved`, always perform a fresh closure sweep on the linked issue or PR after the latest push or evaluation has completed. Do not rely on an earlier preflight.
- Do not change `resolutionStatus` until you have read at least one related run's `result` and `trace`.
- Do not mark an issue `resolved` without clear closure evidence such as an existing GitHub issue, PR, merged fix, or a verified duplicate that already has external tracking.
- Do not mark an issue `resolved` if there are unread or unaddressed PR comments, review comments, review decisions, or issue comments that arrived after the last time you inspected the linked artifact.
- Keep `notes` short but auditable. Include the representative run, the main failing signal, and the code or tool signal that supports the conclusion.
- If the evidence is incomplete, keep the issue `todo` or move it to `in_progress` and explicitly state what is still missing.
- For a real and repairable issue in `mcp/src` or `config/source/skills`, the default expectation is full follow-through: attribution triage, GitHub issue linkage, isolated worktree repair, validation, and PR creation.
- When the fix belongs to CloudBase skill content, edit `config/source/skills/` as the source of truth. Do not treat the root `skills/` directory as the source for those external skills.
- Only stop at status-only attribution updates when the issue is non-actionable, blocked by missing evidence, blocked by missing Worktrunk, or clearly outside MCP repo control.
- Do not use broad uncategorized backlog queries as the default source of work. Only use them in explicit fallback mode when category labels are incomplete or the user asks for a full backlog sweep.
- Items discovered through fallback broad queries must not enter the repair queue until run evidence clearly shows they belong to `mcp/src` or `config/source/skills`.
- Prefer one sub-agent per issue when sub-agent support exists. Give each sub-agent ownership of exactly one issue. If sub-agents are unavailable, process issues serially and keep a strict one-issue-at-a-time context.
- If a repair is needed, use Worktrunk's `wt` workflow for the isolated worktree. If `wt` is unavailable, stop and report that Worktrunk is missing instead of silently falling back to a shared checkout.
- Never reuse the same worktree for multiple attribution issues.
- Do not open or update GitHub issues until you have enough run evidence to explain the problem clearly.
- If an issue already has a GitHub issue or PR, read its current state before starting a new branch or changing direction: open or closed status, latest comments, review decisions, and whether the linked work is already stale or superseded.
- Review comments and post-PR evaluation failures are part of the same repair loop. Use them to drive another iteration instead of prematurely closing the attribution.
- If a real evaluation interface is available, prefer leaving the attribution `in_progress` until the repaired branch or PR passes a fresh evaluation round.
- Do not claim validation success from reasoning alone. Use the evaluation API and the final run result whenever that interface is available.

## Required preflight

Before starting a fresh diagnosis or code iteration for an attribution issue, complete this checklist in order:

1. Read the attribution detail plus the latest `notes`.
2. Read `externalUrl` if present.
3. If `externalUrl` points to a GitHub issue, check whether it is open or closed and whether later comments changed the fix direction.
4. If `externalUrl` points to a PR, check whether it is open, merged, closed, or superseded.
5. Read PR comments, review comments, and review decisions before deciding whether to continue the same branch or start a new iteration.
6. Only after that, pick the representative run and continue into `result`, `trace`, and `evaluation-trace`.

## Required closure preflight

Before changing an attribution to `resolved`, complete this checklist in order even if you already did the normal preflight earlier:

1. Reopen the linked GitHub issue or PR.
2. Re-read the latest top-level comments, review comments, and review decisions.
3. Confirm whether any comment arrived after the latest code push, PR update, or evaluation result.
4. If new feedback exists, keep the attribution `in_progress` and continue the loop.
5. Only if there is no newer unresolved feedback, evaluate whether closure evidence is now strong enough.

## Quick commands

```bash
curl -s 'http://127.0.0.1:5174/api/attributions?category=tool&resolutionStatus=todo&limit=50'
curl -s 'http://127.0.0.1:5174/api/attributions?category=skill&resolutionStatus=todo&limit=50'
curl -s "http://127.0.0.1:5174/api/attributions/<issueId>"
curl -s "http://127.0.0.1:5174/api/runs/<caseId>/<runId>/result"
curl -s "http://127.0.0.1:5174/api/runs/<caseId>/<runId>/trace"
curl -s "http://127.0.0.1:5174/api/runs/<caseId>/<runId>/evaluation-trace"
wt switch --create feature/attribution-<slug>
gh issue create --repo TencentCloudBase/CloudBase-MCP
gh pr view <number> --comments --repo TencentCloudBase/CloudBase-MCP
gh pr create --repo TencentCloudBase/CloudBase-MCP
curl -s -X POST http://127.0.0.1:5174/api/evaluations
```

## Minimum self-check

- Did I complete the existing-artifact preflight before starting a new diagnosis or code iteration?
- Did I inspect at least one related run's `result` and `trace` before changing status?
- Did I keep this issue isolated from every other issue?
- Is the issue actually actionable in `mcp/src`, `config/source/skills`, or is it environment / grader noise?
- If the issue was actionable, did I continue into GitHub issue / worktree / PR work instead of stopping at triage?
- If there was already a PR or review thread, did I continue from that feedback instead of ignoring it?
- If a real evaluation interface was available, did I run a fresh evaluation before treating the issue as closed?
- Did I base the validation conclusion on the final evaluation result instead of my own guess?
- If I patched the attribution, did I keep the change limited to `resolutionStatus`, `owner`, `notes`, and `externalUrl` when relevant?
- If I started a fix, does it live in its own Worktrunk worktree and branch?
- If I marked something `resolved`, is there explicit closure evidence?
- If I marked something `resolved`, did I re-read the latest PR comments, review comments, review decisions, and issue comments immediately before closing it?
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Evaluation Verification

## Purpose

Use this reference when you need to verify that a proposed fix or repair direction actually works in a real evaluation run.

## Core rule

Do not treat implementation or static reasoning as proof of success.

When the AI Coding Eval Report API is available, validation means:

1. trigger a real evaluation run
2. wait for it to finish
3. read the final result
4. decide pass or fail from the returned run result

## Required flow

After you believe the implementation is ready:

1. `POST /api/evaluations`
2. record `caseId` and `runId`
3. poll `GET /api/evaluations/{caseId}/{runId}`
4. once finished, read `GET /api/runs/{caseId}/{runId}/result`
5. optionally read `trace` and `evaluation-trace` if the run failed or is ambiguous

Do not claim "validated" before step 4 is complete.

## Required request shape

The request body must include:

- `caseId`
- `config`

Typical config fields:

- `mcp`
- `tcbCli`
- `skillsMode`
- `mcpPackage`
- `skillsRepo`
- `skillsRef`
- `skillsPath`
- `allInOneSkillsRepo`
- `allInOneSkillsRef`
- `allInOneSkillsLocalPath`
- `apiSkillsRepo`
- `apiSkillsRef`
- `apiSkillsPath`
- `agentType`
- `model`
- `evalModel`
- `maxTurns`
- `caseTimeoutSeconds`
- `evaluationTimeoutMs`

## Local build expectations

When local MCP code is under test:

- build `mcp` first
- pass the absolute path to `mcp/dist/cli.cjs` as `config.mcpPackage`

When `skillsMode=allinone` is under test:

- build the local all-in-one skill bundle first
- pass the absolute bundle path as `config.allInOneSkillsLocalPath`

## Result interpretation

Use the final run result as the source of truth.

### Pass

Treat the evaluation as passed when:

- `result.status == "pass"`
- and there is no failed test signal

### Fail

Treat the evaluation as failed when any of these are true:

- `result.status == "fail"`
- `result.status == "error"`
- `result.status == "timeout"`
- `tests.failed > 0`
- `error` is present

## Failure follow-up

If the run fails:

1. capture the failure summary
2. inspect `trace` and `evaluation-trace`
3. decide whether the problem is:
- still the same root cause
- a new regression
- environment noise
4. continue the repair loop if needed

## Reporting format

Every verification summary should include at least:

- `caseId`
- `runId`
- evaluation status
- overall score when available
- tests passed / failed / total when available
- whether validation passed
- failure reason when it did not pass

## Attribution rule

If real evaluation is available, do not move a repairable issue to `resolved` until the final evaluation result supports closure or there is another explicit closure reason.
123 changes: 123 additions & 0 deletions .agents/skills/mcp-attribution-worktree/references/iteration-loop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Iteration Loop

## Purpose

Use this reference when an attribution issue already has a linked GitHub issue or PR, or when a new review or evaluation result arrives after the first repair attempt.

## Core rule

Treat GitHub issues, PRs, review comments, and fresh evaluation results as part of the same repair loop.

Do not assume that:

- the first PR is final
- opening a PR is enough to mark the attribution closed
- an existing `externalUrl` means no more work is needed

## Existing artifact first

When an attribution already has `externalUrl` or notes pointing to previous work:

1. Read that issue or PR first.
2. Capture its current state first:
- issue open or closed
- PR open, merged, closed, or superseded
3. Read comments, review comments, review decisions, and any follow-up discussion.
4. Compare that feedback with the latest attribution run evidence.
5. Decide whether to continue the same approach or redirect it.

Do not start a new branch, worktree, or diagnosis pass until this check is complete.

## Closure sweep

Before you change an attribution from `in_progress` to `resolved`, repeat the artifact check one more time even if you already did it at the start of the iteration.

You must re-read:

1. the latest top-level PR comments
2. review comments
3. review decisions
4. linked issue comments when relevant

Treat any newer unresolved feedback as a new iteration signal. Do not close the attribution off an older snapshot of the PR.

## Continue vs restart

### Continue the same PR

Prefer continuing the same PR when:

- review feedback asks for corrections, tightening, or missing edge cases
- the overall repair direction is still correct
- the next change is an iteration, not a different root-cause theory

### Start a new iteration path

Prefer a new worktree or branch when:

- review or fresh evaluation shows the first direction was wrong
- the repair target moved from `mcp/src` to `config/source/skills`, or vice versa
- the PR became too mixed or too far from the new diagnosis
- the linked PR is closed, stale, or superseded and continuing it would hide the new root cause

## Post-PR evaluation

If a real evaluation interface exists:

1. run a fresh evaluation after the PR or branch update
2. inspect the resulting run and failed checks
3. decide whether:
- the issue is now closed
- the issue needs another iteration
- the issue turned out to be grader or environment noise

Use that new evaluation as stronger closure evidence than static reasoning alone.

## Attribution state during iteration

### Keep `in_progress`

Keep the attribution `in_progress` while:

- a PR exists but review feedback is unresolved
- a fresh evaluation still fails
- the next iteration is already clear

### Move to `resolved`

Move to `resolved` only when:

- the PR or linked issue provides explicit closure
- and any available fresh evaluation no longer shows the original failure mode
- and the closure sweep found no newer unresolved PR comments, review comments, review decisions, or issue comments

### Move to `invalid`

Move to `invalid` when later review or evaluation shows:

- the original attribution was wrong
- the real failure was environment or grader noise
- no repo or skill-source change is actually needed

## Notes guidance

When iterating, append evidence like this:

```text
iteration=<n>; prior=<issue or PR link>; new_signal=<review or eval summary>; conclusion=<continue or redirect>
```

This keeps the attribution auditable across multiple rounds.

## Mandatory iteration checklist

Before each new iteration:

1. read attribution `notes`
2. read `externalUrl`
3. inspect linked issue or PR state
4. read comments and review decisions
5. decide continue vs restart
6. only then inspect the next representative run

Before `resolved`, repeat steps 2-4 after the latest push or evaluation result so the closure decision is based on a fresh artifact read rather than an older preflight snapshot.
Loading
Loading