fix(snapshot): harden --json output for CI consumers by hidai25 · Pull Request #186 · hidai25/eval-view

hidai25 · 2026-04-20T07:49:35Z

Summary

Follow-up to #182. Tightens the edges on evalview snapshot --json so the payload is actually consumable by CI.

Clean stdout in JSON mode — plumb json_output into _execute_snapshot_tests and skip run_with_spinner. Previously per-test prints and Rich spinner frames wrote to the same stream as the JSON payload, so evalview snapshot --json | jq would fail on real runs.
Accurate per-test saved / golden_file — _save_snapshot_results now returns a {name -> Path} map; the JSON builder keys off that instead of a global count. Before, a passing test whose save_golden raised would still appear as saved: true as long as at least one sibling saved.
Real golden paths — use the Path returned by GoldenStore.save_golden (which is variant-aware and ends in .golden.json) instead of guessing {test_case}.yaml.
--preview --json rejected upfront — emits a JSON error + ctx.exit(2) instead of silently dropping --json.
Style — json import grouped with stdlib, indent=2 for consistency with skill/model-check --json, and the --json help text now documents the suppression + auto-approve behavior.

No change to the JSON schema.

Test plan

New tests/test_snapshot_json_output.py covers the contract:

--json emits a single parseable JSON document on stdout with no Rich markup / banner / spinner leakage
saved / golden_file reflect actual disk writes even when some saves raise
golden_file uses the variant-aware GoldenStore path
--preview --json exits non-zero with a JSON error payload
Empty suite emits {"error": "no tests found"}
Existing snapshot tests (test_snapshot_generated_workflow.py, test_e2e_snapshot_check.py) still pass — 25/25

https://claude.ai/code/session_01YPVyciLBGFEpKoMV7y8zFQ

Follow-up to Matt's --json flag: - Keep stdout clean in JSON mode by plumbing json_output into _execute_snapshot_tests and running without the Rich live spinner. Previously per-test prints and spinner frames leaked into the payload, making stdout unparseable. - Track the real set of saved tests via a {name -> Path} map so per-test `saved` and `golden_file` reflect actual disk writes — previously any passing test was reported as saved whenever at least one sibling saved, and the path guessed `.yaml` instead of the variant-aware `.golden.json` GoldenStore actually writes. - Reject `--preview --json` upfront with a JSON error and non-zero exit rather than silently dropping --json. - Pretty-print JSON (indent=2) to match `skill`/`model-check` output, tidy the --json help text to document the suppression and auto-approve behavior, and keep `import json` with the other stdlib imports. - Cover the contract with tests/test_snapshot_json_output.py: parseable payload, accurate per-test saved/golden_file tracking (including partial save failures), variant-aware paths, --preview rejection, and the empty-suite error shape. https://claude.ai/code/session_01YPVyciLBGFEpKoMV7y8zFQ

hidai25 merged commit c0d5758 into main Apr 20, 2026
7 checks passed

hidai25 mentioned this pull request Apr 20, 2026

feat: add --json flag to evalview snapshot #183

Closed

hidai25 deleted the claude/review-pr-182-r840b branch April 20, 2026 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(snapshot): harden --json output for CI consumers#186

fix(snapshot): harden --json output for CI consumers#186
hidai25 merged 1 commit intomainfrom
claude/review-pr-182-r840b

hidai25 commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hidai25 commented Apr 20, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants