feat(mcp): impact_analysis v2 — downstream value_diff + data_impact field by iamcxa · Pull Request #1258 · DataRecce/recce

iamcxa · 2026-03-27T09:15:52Z

Summary

Extends impact_analysis to run value_diff on downstream table models and adds a data_impact field so agents can distinguish DAG-reachable models from actually data-affected models.

Problem

Downstream models listed as confirmed_impacted with value_diff: null — agents can't tell if data actually changed
View-modified scenarios (e.g., stg_orders → orders) produce total_affected_row_count=0 because downstream tables skip value_diff
_guidance text says "DO NOT OVERRIDE" — forces agents to blindly trust noisy DAG classifications

Changes

Commit	Description
`184d00e3`	Rename response fields: `confirmed_impacted_models`, `confirmed_not_impacted_models`, `total_affected_row_count`
`4f698386`	Include views in row_count_diff (was incorrectly skipping them)
`45010042`	Extend value_diff to downstream table models + add `data_impact` field (confirmed/none/potential) + rewrite `_guidance`
`a5e65b0e`	Clarify guidance text for `skip_downstream_value_diff` case
`b2d7e9bc`	Add 7 behavioral tests for data_impact, downstream value_diff, skip flags, and guidance

New `data_impact` field

Value	Meaning	Trigger
`confirmed`	Value diff verified data changes exist	`affected_row_count > 0`
`none`	Value diff verified zero data changes	`affected_row_count == 0`
`potential`	No value diff available (views, no PK, skipped)	`value_diff` is null

When data_impact="potential", affected_row_count is forced to null to avoid misleading row_count fallback values.

New `skip_downstream_value_diff` parameter

Default false: run value_diff on ALL impacted table models (modified + downstream)
true: skip downstream models (current behavior, opt-out for large DAGs)

Backward compatibility

All changes are additive — no existing fields removed, skip_downstream_value_diff defaults to false, list names unchanged.

Test plan

101 unit tests pass (7 new behavioral tests in TestImpactAnalysisBehavior)
Verify data_impact field present on all models in confirmed_impacted_models
Verify data_impact="potential" forces affected_row_count=null
Verify skip_downstream_value_diff=true skips downstream value_diff
Verify skip_value_diff=true takes precedence over skip_downstream_value_diff

🤖 Generated with Claude Code

…ompliance Rename response fields to signal authority and match consumer output schema: - impacted_models → confirmed_impacted_models - not_impacted_models → confirmed_not_impacted_models - Add per-model affected_row_count (value_diff total or abs(row_count.delta) fallback) - Add response-level total_affected_row_count (max across models) - value_diff.rows_changed → value_diff.affected_row_count (per-column too) "confirmed_" prefix prevents agents from overriding DAG classifications with their own analysis. Pre-computed affected_row_count eliminates semantic translation errors (agent copies directly instead of interpreting). Eval result (ch3-phantom-filter, Sonnet, bare mode): Before: 8/12 (agent overrides 3 downstream models + wrong row count) After: 12/12 (agent copies confirmed lists + total_affected_row_count) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>

Views were skipped from row_count_diff, causing affected_row_count to be null/0 for view-only changes. Views support SELECT COUNT(*) and their row count delta is essential for detecting filtered rows (e.g., WHERE clause dropping rows from a staging view). Only value_diff (PK Join) should skip views (expensive). Row count comparison is cheap and should always run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>

…_impact field - Add skip_downstream_value_diff parameter (opt-out for large DAGs) - Remove downstream skip in value_diff loop (table models now compared) - Add data_impact field: confirmed/none/potential per model - Force affected_row_count=null when data_impact=potential - Rewrite _guidance from prescriptive to descriptive Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>

… case Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>

…f, and guidance Adds TestImpactAnalysisBehavior covering: all models have data_impact field, confirmed/none/potential classification logic, null affected_row_count for potential models, guidance text rules, skip_downstream_value_diff, and skip_value_diff precedence. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>

even-wei · 2026-03-31T02:46:06Z

Design Proposal: impact_analysis as Incremental Triage

@iamcxa — Great foundation with the data_impact field and skip_downstream_value_diff. Before we merge, I want to propose a design direction that builds on your work and makes impact_analysis more effective for agent-driven PR validation.

Core Principles

Start cheap, follow the DAG — Check modified models first (using state:modified.body state:modified.macros state:modified.contract for precise selection — not state:modified which includes cosmetic changes). Check terminal consumer nodes last.
Metadata before data — Row count + schema are free signals. Data-level queries (query_diff, value_diff) are expensive. Only run them when metadata signals warrant investigation.
impact_analysis = triage, not results — Its job is to point out what to check, not do the checking. Keep it light. Let agents call targeted tools based on triage signals.
Row count ≠ signal of correctness — Row count delta direction depends entirely on the PR's intent. A WHERE clause change should change row count. A calculation fix shouldn't. Triage reports the signal without judging it. Priority is driven by code change type (body/macros/contract), not data change direction.

Three-Layer Validation Pyramid

           ┌─────────────┐
           │  value_diff  │  Expensive: full row-level JOIN
           │  profile_diff│  Only 1-2 models reach here
           └──────┬───────┘
              ┌───┴────┐
              │ query_ │  Cheap: targeted aggregate query
              │  diff  │  Filters out models with no real change
              └───┬────┘
          ┌───────┴────────┐
          │ impact_analysis │  Free: metadata only (row_count + schema)
          │    (triage)     │  All impacted models
          └────────────────┘

Layer	Tool	Cost	Purpose
1. Triage	impact_analysis	~0 (metadata)	Classify all models, suggest what to check
2. Verify	query_diff	1 aggregate query/model	Confirm whether data actually changed
3. Investigate	value_diff / profile_diff	Full scan	Understand how data changed, row by row

Most models get filtered out at each layer. The expensive step only runs on the 1-2 models that actually need it.

Key change: impact_analysis suggests the verification query

Instead of running value_diff itself, impact_analysis returns a per-model next_action with a suggested query for query_diff. The agent follows the suggestion directly — no SQL construction needed:

{
  "name": "orders",
  "change_status": "modified",
  "materialized": "table",
  "row_count": {"base": 100, "current": 100, "delta": 0},
  "schema_changes": [],
  "next_action": {
    "tool": "query_diff",
    "query": "SELECT COUNT(*), SUM(amount), AVG(amount) FROM {{ orders }}",
    "reason": "body changed, row_count stable — verify values unchanged",
    "priority": "high"
  }
}

impact_analysis can build the suggested query from lineage metadata — it knows each model's columns, picks numeric columns for SUM()/AVG(), categorical for COUNT(DISTINCT ...). If query_diff shows divergence, the agent escalates to value_diff on that specific model.

Per-model `next_action` priority logic

Priority is driven by code change type, not data delta direction:

modified model + schema changes           → profile_diff(changed_cols), high
modified model + row_count stable         → query_diff (verify no silent value changes), high
modified model + row_count changed        → query_diff (verify shape), high
downstream table                          → query_diff, medium
downstream view                           → query_diff or skip, low
any model + no PK (if escalation needed)  → profile_diff, medium

Agent interaction flow

Agent → impact_analysis()
       ← "orders: body changed, row_count stable
           next_action: query_diff with suggested query (high)
          customers: downstream, row_count stable
           next_action: query_diff with suggested query (medium)
          daily_view: downstream view
           next_action: skip (low)"

Agent → query_diff(orders, suggested_query)
       ← base: (100, 5000, 50.0)  current: (100, 5230, 52.3)
          "SUM(amount) +4.6%, AVG(amount) +4.6%"

Agent → "orders aggregates diverge — drilling into row-level diff"

Agent → value_diff(orders)
       ← "5 rows changed in amount column"

Agent → "PR impact: orders calculation fix affects 5 rows,
          SUM(amount) increased 4.6%. customers unchanged."

Each agent action is justified by the previous result, not a bulk dump.

What to keep from this PR

✅ data_impact field (confirmed/none/potential) — useful after agent runs verification
✅ affected_row_count in value_diff response — good field addition
✅ Views included in row_count — useful metadata signal
✅ Behavioral test structure — adapt to new design

What to change

Remove value_diff from the default impact_analysis path
Remove suggested_deep_dives (replaced by next_action)
Fix naming: total_affected_row_count → rename or remove (currently computing max, not total)
Fix E2E tests (test_mcp_e2e.py) for field renames
Update _guidance to teach the triage → verify → investigate workflow

Suggested PR split

PR A (this PR, slim down): Field renames + data_impact + E2E fixes + remove value_diff from default path
PR B (follow-up): Per-model next_action with suggested query_diff queries, updated guidance, interaction sequence tests

Thoughts?

iamcxa · 2026-03-31T08:34:32Z

Quick Validation: Agent Turn Budget Comparison

Before deciding the architecture direction, here's a concrete side-by-side of how each approach affects the Haiku subagent's workflow for a typical scenario (2 modified + 1 downstream model).

Scenario

orders: modified, has PK, row_count stable
customers: modified, schema change (new column), row_count stable
revenue_report: downstream, row_count stable

Approach A — Current design (value_diff inline)

Call 1: impact_analysis()
        → orders: data_impact="confirmed", 5 rows changed, amount +4.6%
        → customers: data_impact="none", 0 rows changed, email column added
        → revenue_report: data_impact="potential"

Call 2: profile_diff(revenue_report)     ← follow suggested_deep_dives
Call 3: list_checks + run_check
Call 4: create_check (×N)

Agent has 80% of the analysis conclusion after Call 1. The mechanical work (lineage classification, row_count, schema_diff, value_diff) is all done by the tool. Agent only needs to interpret and report.

Approach B — Proposal (next_action + query_diff pyramid)

Call 1: impact_analysis()
        → orders: data_impact="potential", next_action: query_diff(suggested SQL)
        → customers: data_impact="potential", next_action: profile_diff
        → revenue_report: data_impact="potential", next_action: query_diff(suggested SQL)

Call 2: query_diff(orders, suggested SQL)        ← agent follows next_action
Call 3: query_diff(revenue_report, suggested SQL) ← agent follows next_action
Call 4: profile_diff(customers)                   ← agent follows next_action
Call 5: value_diff(orders)                        ← agent drills down on divergence
Call 6: list_checks + run_check
Call 7: create_check (×N)

Agent needs 3-4 extra calls to reach the same conclusion Approach A gives in Call 1.

The key question

The original goal of impact_analysis is to extract mechanical steps from the Haiku subagent — reducing both the prompt complexity and the agent's opportunity to make mistakes (wrong model selection, wrong SQL, wrong interpretation of table_not_found, etc.).

Looking at the current agent.ts prompt (~250 lines), large sections are mechanical instructions that exist because the agent must orchestrate individual tools:

Prompt section	Lines	Needed if impact_analysis handles it?
Model Selection for Data Queries	~25	❌ impact_analysis already filters
table_not_found handling	~20	❌ impact_analysis handles errors internally
Data Verification Before Reporting	~15	❌ impact_analysis returns verified data
Workflow steps 1-4	~15	❌ replaced by single impact_analysis call

With inline value_diff (Approach A): ~75 lines of mechanical prompts become unnecessary. Haiku just interprets results.

With pyramid (Approach B): Agent still needs query_diff orchestration prompts, SQL template handling, and divergence-based escalation logic — the mechanical prompt burden shifts but doesn't shrink.

Existing escape hatches

skip_value_diff=True and skip_downstream_value_diff=True already handle the "large DAG" performance concern without removing value_diff from the default path.

Suggestion

Keep value_diff inline (maximizes mechanical extraction). Replace suggested_deep_dives with next_action (better UX for remaining agent decisions). This gives the best of both: impact_analysis handles all mechanical work, agent prompt stays minimal.

Running a real A/B test next to confirm with actual data.

iamcxa · 2026-03-31T09:15:48Z

A/B Test Results + Design Direction Analysis

Real A/B Test (DuckDB, 3-model DAG)

Ran impact_analysis against a realistic scenario: customers (modified + schema change), orders (modified, values changed), revenue_report (downstream, unchanged).

Metric	A (value_diff inline)	B (skip_value_diff)
Models with definitive answer after 1 call	2/3	0/3
Models needing further investigation	1/3	3/3
Agent knows "confirmed no change"	1 (revenue_report)	0

Key finding: With inline value_diff, the agent immediately knows orders has 2 rows changed (amount +33%) and revenue_report has zero data changes. Without it, all 3 models are data_impact: "potential" — the agent must make 3+ additional tool calls to reach the same conclusion.

The 1 model that remained potential (customers) was due to a schema change (columns differ between base/current, so the FULL OUTER JOIN can't compare). This is correct behavior — the tool gracefully degrades and suggests profile_diff for further investigation.

Does Approach A achieve the original goal?

The original goal: extract mechanical steps from the Haiku subagent to reduce prompt complexity, improve correctness, and increase technical moat.

What impact_analysis successfully extracts (high-risk mechanical work):

dbt selector resolution (state:modified.body+ state:modified.macros+ state:modified.contract+)
Lineage DAG classification (modified vs downstream vs unimpacted)
Row count diff for all impacted models
Schema diff (column add/remove/modify detection)
Value diff via FULL OUTER JOIN with PK detection from dbt test metadata
Error classification (table_not_found, permission_denied) with per-step resilience
Deterministic deep dive suggestions (R1-R4 rules)

This is 383 lines of Python with 78 control flow branches — all of which the Haiku agent would otherwise need to orchestrate via 6-8 sequential tool calls with conditional logic at each step. The data_impact field collapses a three-way reasoning distinction (null vs zero vs not-run) into 3 unambiguous tokens.

What remains in agent.ts (~100 lines of mechanical prompt):

Section	Lines	Risk level	Why it's still in prompt
Mermaid DAG generation	~80	Medium	Pure template-following; `summary.py` already has `generate_mermaid_lineage_graph()`
create_check orchestration	~20	Medium	Parameter construction (type, params, name) per model

Assessment: ~70% achieved, with clear path to 90%+

Goal	Current	After next steps
Reduce mechanical prompts	70% — high-risk work extracted, low-risk remains	90%+
Technical moat	85% — dbt internal APIs, cross-env SQL, PK detection in Python	95%
Simpler plugin	60% — agent.ts still ~250 lines	90% (down to ~50-60 lines)

Recommended next steps (by ROI)

Priority	Action	Impact	Complexity
P0	Expose `render_lineage_mermaid` as MCP tool (code exists in `summary.py`)	Eliminates ~95 lines from agent.ts	Low
P1	Return `suggested_checks[]` with full type+params in impact_analysis	Eliminates create_check parameter construction from agent	Medium
P2	Measure Haiku error rate on Mermaid/create_check via Sentry metrics	Validates whether remaining mechanical work actually causes errors	Low

After P0+P1, agent.ts shrinks from ~250 lines to ~50-60 lines. The agent's job becomes: call impact_analysis → interpret results → follow suggested_checks → call render_lineage_mermaid → write analysis. All reasoning, minimal mechanics.

Design direction for this PR

Keep value_diff inline — the A/B data confirms it's the right call for the "extract mechanical work" goal
Replace suggested_deep_dives with next_action (per-model, more actionable) — this is the good idea from the original proposal
skip_value_diff + skip_downstream_value_diff remain as escape hatches for large DAGs
P0/P1 as follow-up PRs — don't block this PR on Mermaid extraction

iamcxa · 2026-03-31T09:26:18Z

Updated Roadmap: Stage 1→2→3

After deeper analysis comparing the plugin agent (recce-reviewer in recce-claude-plugin) with the cloud agent (agent.ts in recce-cloud-infra), here's the refined roadmap.

Key finding: Plugin agent already surpasses cloud agent in analytical capability

Capability	Plugin (recce-reviewer)	Cloud (agent.ts)
Lineage classification	via impact_analysis	manual lineage_diff + parsing
Row count / Schema / Value diff	via impact_analysis (1 call)	3 separate tool calls
data_impact (confirmed/none/potential)	via impact_analysis	agent infers manually
Risk assessment	HIGH/MEDIUM/LOW rules	no framework
DAG visualization	not available	~80 lines Mermaid prompt
Check persistence	not available	create_check loop

The plugin agent uses impact_analysis as entry point and gets more analytical signal in 1 call than the cloud agent gets in 6+ calls. The cloud agent only wins on presentation (Mermaid DAG) and persistence (create_check).

Three-stage plan

Stage 1 (DONE): impact_analysis gives the plugin agent analytical parity+

Plugin already uses impact_analysis as entry point
data_impact + value_diff + risk assessment = analytical capability exceeds cloud agent

Stage 2 (this PR): Upgrade impact_analysis so cloud agent can also benefit

suggested_deep_dives → next_action (per-model, with priority)
next_action = null when data_impact is confirmed/none (agent only acts on potential)
Fix total_affected_row_count naming
Update _guidance to teach the triage workflow
Keep value_diff inline (A/B validated)
Cloud agent.ts prompt reduction: ~250 → ~150 lines

Stage 3 — PR B (follow-up), by priority:

Priority	Action	Impact	Complexity
P0	`render_lineage_mermaid` MCP tool	Eliminates ~95 lines from agent.ts	Low (~20 lines — `summary.py` already has full implementation)
P1	`suggested_checks[]` with full type+params in impact_analysis	Eliminates create_check parameter construction from agent	Medium
P2	Measure Haiku error rate on remaining mechanical tasks via Sentry	Validates remaining gaps	Low

P0 detail: render_lineage_mermaid

summary.py already contains the complete Mermaid rendering pipeline: Node/Edge/LineageGraph classes, BFS from modified nodes, per-resource-type shapes, [What's Changed] annotations with row_count delta %, schema diffs, check integration, and MAX_MERMAID_TEXT_SIZE truncation. The MCP tool is a thin wrapper (~20 lines) over generate_mermaid_lineage_graph().

This is actually richer than what agent.ts teaches Haiku to produce — the Python version includes check annotations, resource-type shapes, and size-based truncation that the 80-line agent.ts prompt doesn't cover.

After P0, agent.ts shrinks to ~60 lines — same complexity level as recce-reviewer. The two agents converge to essentially the same workflow:

1. impact_analysis()           → analytical results
2. follow next_action          → profile_diff on potential models
3. render_lineage_mermaid()    → deterministic Mermaid (cloud only)
4. create_check()              → persist evidence (cloud only)
5. write summary

- Move suggestion logic from top-level array to per-model `next_action` field - next_action=null for confirmed/none models (no follow-up needed) - next_action includes tool, columns, reason, priority for potential models - Priority driven by code change type: modified+schema→high, downstream→medium - Rename total_affected_row_count → max_affected_row_count (was computing max) - Remove top-level suggested_deep_dives from response - Update _guidance to teach next_action workflow - Update E2E tests for field renames and next_action structure - Update behavioral tests with next_action assertions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>

…logic - Confirmed low change ratio → next_action=None - data_impact='none' → next_action=None - next_action field completeness (tool, columns, reason, priority) - Response uses max_affected_row_count, not total_affected or suggested_deep_dives - Schema change next_action has priority=high - Downstream view next_action has priority=low Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>

iamcxa · 2026-03-31T10:19:42Z

PR A Implementation Complete

Two commits pushed to implement the design changes discussed above:

Commit 1: `refactor(mcp): replace suggested_deep_dives with per-model next_action`

Changes to mcp_server.py:

Replaced top-level suggested_deep_dives array with per-model next_action field
data_impact=confirmed/none → next_action=null (no follow-up needed)
data_impact=potential → next_action={tool, columns, reason, priority}
Priority logic driven by code change type:
- modified + schema_changes → high
- modified + no value_diff (view/no PK) → high
- downstream table → medium
- downstream view → low
Confirmed models with high change ratio (R1) still get next_action (medium)
Renamed total_affected_row_count → max_affected_row_count
Updated _guidance to teach next_action workflow
Updated tool description to reference next_action instead of suggested_deep_dives

Changes to test_mcp_e2e.py:

Updated all field references (impacted_models → confirmed_impacted_models, etc.)
Converted suggested_deep_dives tests to next_action assertions
Fixed view row_count test (views now included in row_count per earlier commit)
Fixed per-column field name (rows_changed → affected_row_count)

Commit 2: `test(mcp): add coverage for next_action, field renames, and priority logic`

6 new tests covering gaps identified in coverage analysis:

Test	What it verifies
`test_confirmed_low_change_ratio_has_null_next_action`	confirmed + low change → null
`test_none_data_impact_has_null_next_action`	none (zero changes) → null
`test_next_action_has_all_required_fields`	tool, columns, reason, priority all present
`test_response_uses_new_field_names`	max_affected_row_count exists, old fields absent
`test_next_action_schema_change` (priority assert)	schema change → priority=high
`test_next_action_downstream_view_low_priority`	downstream view → priority=low

All 67 tests pass (55 E2E + 11 behavioral + 1 registration).

iamcxa and others added 6 commits March 27, 2026 17:14

fix(mcp): clarify data_impact guidance for skip_downstream_value_diff…

a5e65b0

… case Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>

Merge branch 'main' into feat/mcp-impact-analysis-v2

592529c

iamcxa self-assigned this Mar 27, 2026

iamcxa requested a review from even-wei March 27, 2026 09:26

iamcxa mentioned this pull request Mar 27, 2026

feat(recce-dev): eval v2 framework + MCP impact_analysis improvements DataRecce/recce-claude-plugin#18

Merged

4 tasks

iamcxa and others added 2 commits March 31, 2026 18:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mcp): impact_analysis v2 — downstream value_diff + data_impact field#1258

feat(mcp): impact_analysis v2 — downstream value_diff + data_impact field#1258
iamcxa wants to merge 8 commits intomainfrom
feat/mcp-impact-analysis-v2

iamcxa commented Mar 27, 2026

Uh oh!

even-wei commented Mar 31, 2026

Uh oh!

iamcxa commented Mar 31, 2026

Uh oh!

iamcxa commented Mar 31, 2026

Uh oh!

iamcxa commented Mar 31, 2026

Uh oh!

iamcxa commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

iamcxa commented Mar 27, 2026

Summary

Problem

Changes

New data_impact field

New skip_downstream_value_diff parameter

Backward compatibility

Test plan

Uh oh!

even-wei commented Mar 31, 2026

Design Proposal: impact_analysis as Incremental Triage

Core Principles

Three-Layer Validation Pyramid

Key change: impact_analysis suggests the verification query

Per-model next_action priority logic

Agent interaction flow

What to keep from this PR

What to change

Suggested PR split

Uh oh!

iamcxa commented Mar 31, 2026

Quick Validation: Agent Turn Budget Comparison

Scenario

Approach A — Current design (value_diff inline)

Approach B — Proposal (next_action + query_diff pyramid)

The key question

Existing escape hatches

Suggestion

Uh oh!

iamcxa commented Mar 31, 2026

A/B Test Results + Design Direction Analysis

Real A/B Test (DuckDB, 3-model DAG)

Does Approach A achieve the original goal?

Assessment: ~70% achieved, with clear path to 90%+

Recommended next steps (by ROI)

Design direction for this PR

Uh oh!

iamcxa commented Mar 31, 2026

Updated Roadmap: Stage 1→2→3

Key finding: Plugin agent already surpasses cloud agent in analytical capability

Three-stage plan

P0 detail: render_lineage_mermaid

Uh oh!

iamcxa commented Mar 31, 2026

PR A Implementation Complete

Commit 1: refactor(mcp): replace suggested_deep_dives with per-model next_action

Commit 2: test(mcp): add coverage for next_action, field renames, and priority logic

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

New `data_impact` field

New `skip_downstream_value_diff` parameter

Per-model `next_action` priority logic

Commit 1: `refactor(mcp): replace suggested_deep_dives with per-model next_action`

Commit 2: `test(mcp): add coverage for next_action, field renames, and priority logic`