feat(mcp): impact_analysis v2 — downstream value_diff + data_impact field#1258
feat(mcp): impact_analysis v2 — downstream value_diff + data_impact field#1258
Conversation
…ompliance Rename response fields to signal authority and match consumer output schema: - impacted_models → confirmed_impacted_models - not_impacted_models → confirmed_not_impacted_models - Add per-model affected_row_count (value_diff total or abs(row_count.delta) fallback) - Add response-level total_affected_row_count (max across models) - value_diff.rows_changed → value_diff.affected_row_count (per-column too) "confirmed_" prefix prevents agents from overriding DAG classifications with their own analysis. Pre-computed affected_row_count eliminates semantic translation errors (agent copies directly instead of interpreting). Eval result (ch3-phantom-filter, Sonnet, bare mode): Before: 8/12 (agent overrides 3 downstream models + wrong row count) After: 12/12 (agent copies confirmed lists + total_affected_row_count) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>
Views were skipped from row_count_diff, causing affected_row_count to be null/0 for view-only changes. Views support SELECT COUNT(*) and their row count delta is essential for detecting filtered rows (e.g., WHERE clause dropping rows from a staging view). Only value_diff (PK Join) should skip views (expensive). Row count comparison is cheap and should always run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>
…_impact field - Add skip_downstream_value_diff parameter (opt-out for large DAGs) - Remove downstream skip in value_diff loop (table models now compared) - Add data_impact field: confirmed/none/potential per model - Force affected_row_count=null when data_impact=potential - Rewrite _guidance from prescriptive to descriptive Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>
… case Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>
…f, and guidance Adds TestImpactAnalysisBehavior covering: all models have data_impact field, confirmed/none/potential classification logic, null affected_row_count for potential models, guidance text rules, skip_downstream_value_diff, and skip_value_diff precedence. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>
Design Proposal: impact_analysis as Incremental Triage@iamcxa — Great foundation with the Core Principles
Three-Layer Validation Pyramid
Most models get filtered out at each layer. The expensive step only runs on the 1-2 models that actually need it. Key change: impact_analysis suggests the verification queryInstead of running value_diff itself, impact_analysis returns a per-model {
"name": "orders",
"change_status": "modified",
"materialized": "table",
"row_count": {"base": 100, "current": 100, "delta": 0},
"schema_changes": [],
"next_action": {
"tool": "query_diff",
"query": "SELECT COUNT(*), SUM(amount), AVG(amount) FROM {{ orders }}",
"reason": "body changed, row_count stable — verify values unchanged",
"priority": "high"
}
}impact_analysis can build the suggested query from lineage metadata — it knows each model's columns, picks numeric columns for Per-model
|
Quick Validation: Agent Turn Budget ComparisonBefore deciding the architecture direction, here's a concrete side-by-side of how each approach affects the Haiku subagent's workflow for a typical scenario (2 modified + 1 downstream model). Scenario
Approach A — Current design (value_diff inline)Agent has 80% of the analysis conclusion after Call 1. The mechanical work (lineage classification, row_count, schema_diff, value_diff) is all done by the tool. Agent only needs to interpret and report. Approach B — Proposal (next_action + query_diff pyramid)Agent needs 3-4 extra calls to reach the same conclusion Approach A gives in Call 1. The key questionThe original goal of Looking at the current agent.ts prompt (~250 lines), large sections are mechanical instructions that exist because the agent must orchestrate individual tools:
With inline value_diff (Approach A): ~75 lines of mechanical prompts become unnecessary. Haiku just interprets results. With pyramid (Approach B): Agent still needs query_diff orchestration prompts, SQL template handling, and divergence-based escalation logic — the mechanical prompt burden shifts but doesn't shrink. Existing escape hatches
SuggestionKeep value_diff inline (maximizes mechanical extraction). Replace Running a real A/B test next to confirm with actual data. |
A/B Test Results + Design Direction AnalysisReal A/B Test (DuckDB, 3-model DAG)Ran
Key finding: With inline value_diff, the agent immediately knows The 1 model that remained Does Approach A achieve the original goal?The original goal: extract mechanical steps from the Haiku subagent to reduce prompt complexity, improve correctness, and increase technical moat. What impact_analysis successfully extracts (high-risk mechanical work):
This is 383 lines of Python with 78 control flow branches — all of which the Haiku agent would otherwise need to orchestrate via 6-8 sequential tool calls with conditional logic at each step. The What remains in agent.ts (~100 lines of mechanical prompt):
Assessment: ~70% achieved, with clear path to 90%+
Recommended next steps (by ROI)
After P0+P1, agent.ts shrinks from ~250 lines to ~50-60 lines. The agent's job becomes: call Design direction for this PR
|
Updated Roadmap: Stage 1→2→3After deeper analysis comparing the plugin agent ( Key finding: Plugin agent already surpasses cloud agent in analytical capability
The plugin agent uses Three-stage planStage 1 (DONE):
Stage 2 (this PR): Upgrade impact_analysis so cloud agent can also benefit
Stage 3 — PR B (follow-up), by priority:
P0 detail: render_lineage_mermaid
This is actually richer than what agent.ts teaches Haiku to produce — the Python version includes check annotations, resource-type shapes, and size-based truncation that the 80-line agent.ts prompt doesn't cover. After P0, agent.ts shrinks to ~60 lines — same complexity level as recce-reviewer. The two agents converge to essentially the same workflow: |
- Move suggestion logic from top-level array to per-model `next_action` field - next_action=null for confirmed/none models (no follow-up needed) - next_action includes tool, columns, reason, priority for potential models - Priority driven by code change type: modified+schema→high, downstream→medium - Rename total_affected_row_count → max_affected_row_count (was computing max) - Remove top-level suggested_deep_dives from response - Update _guidance to teach next_action workflow - Update E2E tests for field renames and next_action structure - Update behavioral tests with next_action assertions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>
…logic - Confirmed low change ratio → next_action=None - data_impact='none' → next_action=None - next_action field completeness (tool, columns, reason, priority) - Response uses max_affected_row_count, not total_affected or suggested_deep_dives - Schema change next_action has priority=high - Downstream view next_action has priority=low Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>
PR A Implementation CompleteTwo commits pushed to implement the design changes discussed above: Commit 1:
|
| Test | What it verifies |
|---|---|
test_confirmed_low_change_ratio_has_null_next_action |
confirmed + low change → null |
test_none_data_impact_has_null_next_action |
none (zero changes) → null |
test_next_action_has_all_required_fields |
tool, columns, reason, priority all present |
test_response_uses_new_field_names |
max_affected_row_count exists, old fields absent |
test_next_action_schema_change (priority assert) |
schema change → priority=high |
test_next_action_downstream_view_low_priority |
downstream view → priority=low |
All 67 tests pass (55 E2E + 11 behavioral + 1 registration).
Summary
Extends
impact_analysisto run value_diff on downstream table models and adds adata_impactfield so agents can distinguish DAG-reachable models from actually data-affected models.Problem
confirmed_impactedwithvalue_diff: null— agents can't tell if data actually changedtotal_affected_row_count=0because downstream tables skip value_diff_guidancetext says "DO NOT OVERRIDE" — forces agents to blindly trust noisy DAG classificationsChanges
184d00e3confirmed_impacted_models,confirmed_not_impacted_models,total_affected_row_count4f69838645010042data_impactfield (confirmed/none/potential) + rewrite_guidancea5e65b0eskip_downstream_value_diffcaseb2d7e9bcNew
data_impactfieldconfirmedaffected_row_count > 0noneaffected_row_count == 0potentialvalue_diffis nullWhen
data_impact="potential",affected_row_countis forced tonullto avoid misleading row_count fallback values.New
skip_downstream_value_diffparameterfalse: run value_diff on ALL impacted table models (modified + downstream)true: skip downstream models (current behavior, opt-out for large DAGs)Backward compatibility
All changes are additive — no existing fields removed,
skip_downstream_value_diffdefaults tofalse, list names unchanged.Test plan
TestImpactAnalysisBehavior)data_impactfield present on all models inconfirmed_impacted_modelsdata_impact="potential"forcesaffected_row_count=nullskip_downstream_value_diff=trueskips downstream value_diffskip_value_diff=truetakes precedence overskip_downstream_value_diff🤖 Generated with Claude Code