feat(cli-finding-classifier, cli-push-runner): Bundle i — Phase d 着手前必須 follow-up (順位 91 + 92 + 93) by aloekun · Pull Request #135 · aloekun/claude-code-hook-test

aloekun · 2026-05-09T11:57:43Z

Summary

PR #132 (Phase c MVP) の post-merge-feedback で採用された Bundle i (Phase d 着手前必須 follow-up) 3 件を 1 PR で land。

順位 91 (Tier 2 fix(hooks): Replace matcher追加 & deploy時のpermissions保持 #4 / Effort S): [lint_screen] config parse test を src/cli-push-runner/src/config.rs に追加 (5 tests / silent field rename 防止)
順位 92 (Tier 2 refactor(deploy): 出力先へのテンプレート配布を廃止 #5 / Effort M): scale-aware eval fixtures 3 件 (eval13/14/15 — 200+ 行) を追加し lint-screen-evals.json に baseline を登録、lint_screen_evals.rs で count test を rename + 上限緩和、Bundle i 実体スモーク test を追加
順位 93 (Tier 3 feat(hooks): push 前パイプラインハーネスの実装 #8 / Effort XS): ~/.claude/rules/common/coding-style.md に「変更差分外への partial fix 再発」anti-pattern を codify (PR fix(refs): docs/todo.md セクション参照を非ドキュメントファイルから削除 #94 / feat(rules): Bundle e (convention long-tail) — 7 件 global rules 集約 (順位 23/24/25/26/30/33/70) #111 / feat(cli-push-runner): Phase c MVP — pre-push lint-screen step (default OFF) #132 を inline cite)

Phase d 着手前提条件の充足状況

Bundle i land で以下が揃った:

充足要素	Bundle i 寄与
(a) `[lint_screen]` config silent failure 防止	順位 91
(b) scale-aware fixtures による failure mode の reproducible measurement	順位 92
(c) cross-file partial fix anti-pattern の global rule 化	順位 93

次は §8.D v4 prompt 改訂 (大規模 diff の JSON 完全性改善ループ) が Phase d 着手前の最終 gate。

dogfood 結果 (mistral:7b / temperature=0)

agreement = 11/15 = 73.3% (Phase b' 75% から marginal 劣化 = fixture が設計通り failure mode を再現)
aggregate precision=76.2% recall=51.6%
latency p50=4591ms p95=8370ms
verdict: CONDITIONAL-GO (§8.E auto_fix lane に限定)

Bundle i fixtures の挙動

eval	scale	decision	findings recall	fallback	評価
13	5 file / 280 行	❌ auto_fix → human_review	0/5	`missing field 'screen_decision'`	設計通り — PR #132 smoke (868 行 diff) と同型の JSON schema breakdown を再現
14	3 file / 153 行	✅ auto_fix → auto_fix	1/3 (33%)	None	mid-scale recall 著しい劣化
15	1 file / 208 行	❌ auto_fix → human_review	0/5	`missing field 'severity' at line 38`	設計通り — 単 file 長尺の nested field omission を新規捕捉

既存 fixture (Phase b' 12 件)

9/12 一致 (Phase b' 75% baseline 維持)
eval5 (multi-issue: human_review → auto_fix): 既知の disagreement (deep-nesting 取りこぼし) 継続
eval8/10/12: 既知の境界判定 disagreement 継続

agreement 73.3% の解釈

todo6.md L164 「agreement 75% 以上が維持されているか、未達理由が文書化される」branch を満たす形で land。

73.3% < 75% は regression ではなく、Bundle i fixture が設計した stress test (JSON schema breakdown @ scale) が 再現可能化された 結果
Phase d 投入前の data 確保: これが取れたから「Phase d の本番 PR で fallback 頻発」を定量的に予測できる
§8.D v4 prompt 改訂ループは別 bundle に切り出す (本 PR の scope 外、feedback_minimize_pr_count_during_rate_limit と整合 — fixture と prompt 改訂は責務が別)

Test plan

cargo test -p cli-push-runner 53 tests pass (新 5 tests 含む)
cargo test -p cli-finding-classifier --test lint_screen_evals schema 系 20 tests pass
cargo test -p cli-finding-classifier --test lint_screen_evals -- --ignored end-to-end 1 test pass (Ollama 起動済 / mistral:7b / 73.3% agreement / fallback rate 2/15 観測)
pnpm build:all 全 exe 成功
takt pre-push-review 1 iteration / APPROVE (simplicity-review + security-review 両方 APPROVE)

Out of scope

§8.D v4 prompt 改訂 (Phase d 着手前の最終 gate、別 bundle で対応)
Phase d (PR-based dogfood) (Bundle i + §8.D v4 land 後に判定)
派生プロジェクト (techbook-ledger / auto-review-fix-vc) への deploy (本 PR は試験運用 default OFF lane の infrastructure 整備のみ)

Summary by CodeRabbit

リリースノート

New Features
- 認証機能を追加しました（パスワードハッシュ、セッション管理、トークン発行を含む）。
- エラーハンドリングを強化しました（エラーコンテキスト、リトライポリシー、タイムアウト対応）。
- Markdown風ドキュメント解析機能を実装しました（セクション抽出、メタデータ、アウトライン生成）。
Tests
- 評価データセットを拡張し、関連テストを追加／更新しました（構成の単体テストや評価セット検証を強化）。

…必須 follow-up (順位 91 + 92 + 93) PR #132 (Phase c MVP land) の post-merge-feedback で採用された 3 件を 1 PR にまとめて land。 ## 順位 91 (Tier 2 #4): [lint_screen] config parse test - src/cli-push-runner/src/config.rs に 5 tests を追加 - silent field rename / 追加で None fallback する failure mode を unit test で防止 - full fields / minimal only enabled / absent yields None / numeric defaults / string defaults の 5 軸独立検証 ## 順位 92 (Tier 2 #5): scale-aware eval fixtures (200+ 行) - eval13-large-refactor-real.diff (5 file / 280 行) — context 限界 + JSON 完全性 - eval14-mid-mixed.diff (3 file / 153 行) — mid-scale recall 安定性 - eval15-syntax-stress.diff (1 file / 208 行) — 単 file 長尺の schema 完全性 - lint-screen-evals.json に id 13/14/15 baseline (auto_fix lane × 13 findings 合計) 追加 - count test を rename + 上限緩和 (eval_set_loads_and_has_at_least_phase_b_prime_baseline_count) - Bundle i 実体スモーク test (eval_set_includes_bundle_i_scale_aware_fixtures) 追加 ### dogfood 結果 (mistral:7b / temperature=0) agreement = 11/15 = 73.3% (Phase b' 75% から marginal 劣化 = fixture が設計通り failure mode を再現) eval13 (280 行): JSON parse error 'missing field screen_decision' → fallback path 作動 = PR #132 smoke (868 行 diff) で観測した failure mode を decisive に再現 eval15 (208 行): JSON parse error 'missing field severity at line 38' = nested field omission の別 failure mode を新規捕捉 eval14 (153 行): JSON 完全だが recall 33% (3 baseline 中 1 件のみ TP) aggregate precision=76.2% recall=51.6% latency p50=4591ms p95=8370ms verdict CONDITIONAL-GO agreement < 75% 未達理由は eval13/15 の fallback (= fixture が設計通り作動した結果) で mechanical に説明可能。Phase d 投入前の必須 measurement を取得 (todo6.md L164 「未達理由が文書化される」branch を満たす)。§8.D v4 prompt 改訂は別 bundle に切り出し。 ## 順位 93 (Tier 3 #8): coding-style.md partial fix anti-pattern codify - ~/.claude/rules/common/coding-style.md § Cross-File Reference Lifecycle に「変更差分外への partial fix 再発」anti-pattern を追加 - PR #94 / #111 / #132 を inline cite (実証ベース) - family_tag を grep -rn で全 path 検索する対処手順、partial fix の意図的切り出しを明記 ## Phase d 着手の前提条件 update Bundle i land で以下が揃った: - (a) [lint_screen] config silent failure 防止 (順位 91) - (b) scale-aware fixtures による failure mode の reproducible measurement (順位 92) - (c) cross-file partial fix anti-pattern の global rule 化 (順位 93) 次は §8.D v4 prompt 改訂で大規模 diff の JSON 完全性を改善するループ (Phase d 着手前の最終 gate)。

coderabbitai · 2026-05-09T11:57:54Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 582dc8a4-9de7-4533-ba07-4755369c1946

📥 Commits

Reviewing files that changed from the base of the PR and between 6ba83a0 and b09cf6e.

📒 Files selected for processing (1)

src/cli-finding-classifier/tests/lint_screen_evals.rs

🚧 Files skipped from review as they are similar to previous changes (1)

src/cli-finding-classifier/tests/lint_screen_evals.rs

📝 Walkthrough

ウォークスルー

認証モジュール、エラー処理ユーティリティ、マークダウン風パーサーの diff-fixtures を追加し、lint-screen eval データセットに 3 件の eval を追加、関連するテスト検証と LintScreenConfig 単体テストを更新しました。

変更内容

Auth モジュール Eval

Layer / File(s)	Summary
Auth 型定義 `src/cli-finding-classifier/evals/files/eval13-large-refactor-real.diff`	`AuthService`、`AuthError`、`PasswordHash`、`PasswordPolicy`、`Session`、`Token` の実装と src/lib.rs での re-export を追加。

Error ハンドリング Eval

Layer / File(s)	Summary
Error 型とユーティリティ `src/cli-finding-classifier/evals/files/eval14-mid-mixed.diff`	`AppError`/`AppResult`、`ErrorContext`、`RetryPolicy` と `run_with_retry`（指数バックオフ）、I/O エラー分類、回復可能性判定を追加。

Parser モジュール Eval

Layer / File(s)	Summary
Parser 型定義 `src/cli-finding-classifier/evals/files/eval15-syntax-stress.diff`	`ParsedDocument`、`Section`、`ParseError` を追加。
Parser 実装 `src/cli-finding-classifier/evals/files/eval15-syntax-stress.diff`	`parse_document`（フェンス処理、タイトル/セクション解析、メタデータ）、`leading_hash_count`、`extract_anchors`、`render_outline`、`count_sections_at_level`、`truncate_to_max_sections` を追加。

Eval インフラストラクチャ

Layer / File(s)	Summary
Eval データセット `src/cli-finding-classifier/evals/lint-screen-evals.json`	三件の新しい eval ケース（id:13–15）と対応する baseline/findings/expectations を追加。
Test Fixture 更新 `src/cli-finding-classifier/tests/lint_screen_evals.rs`	fixture-count を「少なくとも 15」に緩和し、Bundle i 用ガードテストを追加、ignored banner 文言を更新。
設定 Tests `src/cli-push-runner/src/config.rs`	`[lint_screen]` セクションの完全/最小/デフォルト挙動を検証するユニットテストを追加。

推定コード審査負荷

🎯 4 (Complex) | ⏱️ ~45 minutes

関連する可能性のある PR

aloekun/claude-code-hook-test#131: cli-finding-classifier の eval フィクスチャとテストハーネス更新に関する変更と関連。

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	タイトルは3つの優先度付きフォローアップ変更（順位91、92、93）を説明しており、主な変更内容と関連性がある。
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cli-finding-classifier/tests/lint_screen_evals.rs`:
- Around line 239-243: The assertion using assert!(set.evals.len() >= 12, ...)
is too permissive; lock the current baseline by asserting the exact expected
count so regressions are caught. Replace the assert! with an equality check
against the current baseline (use assert_eq!(set.evals.len(), 12, ...)) and
update the message to reflect that 12 is the fixed baseline for Phase b' (adjust
message text accordingly); locate and change the assert that references
set.evals.len() in this test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d387faf4-1b92-423e-b6c4-bfc2cb4311ad

📥 Commits

Reviewing files that changed from the base of the PR and between ccca017 and 6ba83a0.

📒 Files selected for processing (6)

src/cli-finding-classifier/evals/files/eval13-large-refactor-real.diff
src/cli-finding-classifier/evals/files/eval14-mid-mixed.diff
src/cli-finding-classifier/evals/files/eval15-syntax-stress.diff
src/cli-finding-classifier/evals/lint-screen-evals.json
src/cli-finding-classifier/tests/lint_screen_evals.rs
src/cli-push-runner/src/config.rs

…nt 下限を Bundle i baseline 15 に固定 >=12 だと既存 fixture 削除を検出できないため >= 15 に変更し regression 防止。将来の fixture 追加 (>15) は許容。

aloekun · 2026-05-09T13:11:59Z

@coderabbitai review

coderabbitai · 2026-05-09T13:12:05Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

…(Phase d 着手前提達成) (#136) * docs(local-llm-offload-analysis): Bundle i 完了 (PR #135 land) の反映 - 状態 banner / §1 Phase c+ Bundle i / §2 §8.E / §4 再開チェックリストの 4 箇所に同期 - dogfood 結果 (73.3% / fallback 2/15) と Phase d 着手前提条件の充足状況を記録 - 次の最終 gate を「§8.D v4 prompt 改訂で eval13/15 の JSON 完全性問題に一次対策」に lazer 化 * fix(lib-ollama-client): default num_ctx を 8192 に拡張 — §8.D 着手 (root cause: prompt truncation) ## 背景 PR #135 (Bundle i) の dogfood で eval13 (5 file / 280 行) と eval15 (1 file / 208 行) が JSON schema breakdown: - eval13: 'missing field screen_decision' (top-level field omitted) - eval15: 'missing field severity at line 38' (nested field omitted) todo は当初 §8.D を 'v4 prompt 改訂ループ' と命名していたが、root cause を実証検証 (raw Ollama output dump) すると prompt 設計の問題ではなく client 設定が原因と判明。 ## 実証検証 (raw output dump) eval13: prompt_eval_count=4096 (Ollama default num_ctx 上限到達) → schema definition section が context window から truncate され、model は無関係な directory tree JSON を生成。 eval15: prompt_eval_count=4096 → 部分的に schema が残存し、近い field 名 (rule_id / message / start_line) で出力。両 eval の prompt_eval_count が completely identical (= context window cap) であり、 prompt 改訂では解決不能。 ## 修正 - OllamaClient に num_ctx field 追加 + DEFAULT_NUM_CTX = 8192 (mistral:7b は理論上 32K 対応、安全マージン + 推論コストの兼合いで 8192) - GenerateOptions に num_ctx を serialize - with_num_ctx builder 追加 (将来 prompt をさらに長く扱う用途向け) - tests: num_ctx_defaults_and_overrides_apply / num_ctx_is_serialized_into_request_body (mockito で request body に num_ctx:8192 が含まれることを assert) ## dogfood 結果 | 指標 | Before (default) | After (num_ctx=8192) | |---|---|---| | agreement rate | 11/15 = 73.3% | 13/15 = 86.7% | | eval13 fallback | screen_decision missing | none (decision match) | | eval15 fallback | severity missing | none (decision match) | | verdict | CONDITIONAL-GO | GO (§8.E 着手) | 残る 2 件の disagreement (eval5 / eval10) は Phase b' から既知の境界判定で num_ctx と無関係 (= LLM 側の限界、改善の漸近余地あり)。 ## Phase d 着手前提条件の充足完了 Bundle i + 本 commit で以下 4 要素が揃った: - (a) [lint_screen] config silent failure 防止 (PR #135 順位 91) - (b) scale-aware fixtures による reproducible measurement (PR #135 順位 92) - (c) cross-file partial fix anti-pattern の global rule 化 (PR #135 順位 93) - (d) JSON 完全性問題への一次対策 (本 commit、num_ctx 8192) 次は Phase d (PR-based 実環境 dogfood) 着手判定。 * fix(lib-ollama-client): with_num_ctx で 0 を拒否 (CodeRabbit Minor #r3213377294) num_ctx = 0 は Ollama API で実行時 error になるため、build pattern 段階で assert! で fail-fast。 - Panics doc note を with_num_ctx の rustdoc に追記 - #[should_panic(expected = "...")] test (with_num_ctx_panics_on_zero) で seal CodeRabbit PR #136 review #r3213377294 採用。同 review の #r3213377290 (docs の test pass 件数固定) はユーザー判断で任意 → skip。

…eanup (#137) * docs: PR #135/#136 land 反映 — 順位 97 追加 + Phase d gate 解消の stale-doc cleanup PR #135 (Bundle i) と PR #136 (§8.D / num_ctx 8192) merge 後の post-merge-feedback / 状態整合に伴う doc 更新を 1 PR で集約。 ## 順位 97 追加 (PR #136 T2-#1 採用) PR #136 post-merge-feedback で ✅ 採用された Tier 2 #1 (`with_num_ctx(X)` override 値 serialization 検証テスト、Effort S / Adoption Risk None) を todo に登録。 - docs/todo-summary.md: 順位 97 行を table に追加 (Tier 2 / S / 依存なし) - docs/todo6.md: 詳細エントリ (動機 / 設計決定 / 作業計画 / 完了基準) を追加 CodeRabbit nitpick ではなく post-merge-feedback agent が独立に発見した test gap。既存 num_ctx_is_serialized_into_request_body は default 値 (8192) のみ検証で with_num_ctx(X) の wiring が壊れた場合の silent degrade を捕捉できない。 ## Phase d gate 解消の stale-doc cleanup PR #136 で §8.D (= num_ctx 8192) が land したため、analysis.md 内で「Phase d 着手の最終 gate」と記述していた箇所が stale 化。4 箇所を更新: - L5 (状態 banner): §8.D 完了表記を追加、'最終 gate' を 'kickoff 待機' に書き換え - L129 (§1 Phase c+ 残る最終 gate): 'land 完了' に書き換え + root cause pivot 経緯を保存 - L153 (§2 §8.E Phase d 着手前提): ✅ 完了マーク + pivot 経緯 - L185 (§4 再開チェックリスト): '§8.D v4 改訂ループ' → 'Phase d dogfood' - L200 (§4 次に何をするか): Phase d kickoff prep / 実 dogfood / 結果集約の 3 段階に再構成 ## Phase d 着手前提の確認 (a) [lint_screen] config silent failure 防止 ✅ PR #135 (b) scale-aware fixtures による reproducible measurement ✅ PR #135 (c) cross-file partial fix anti-pattern global rule 化 ✅ PR #135 (d) JSON 完全性問題への一次対策 ✅ PR #136 (num_ctx 8192) → Phase d kickoff 可能。dogfood 自体は実 PR 3-5 件で long-running。 * docs(todo-summary): 推奨実行順序サマリーの更新日を 2026-05-10 に同期 (CodeRabbit Minor #r3213575272) 順位 97 を 2026-05-10 に追加したが見出しの「(2026-04-29 更新)」が stale 化していたため修正。外部参照は併設の `<a id="recommended-order-summary"></a>` 経由のため anchor は break しない (`coding-style.md` § 日付入り見出しアンカー — 安定識別子優先に準拠)。 CodeRabbit PR #137 review #r3213575272 採用。

coderabbitai Bot reviewed May 9, 2026

View reviewed changes

Comment thread src/cli-finding-classifier/tests/lint_screen_evals.rs

fix(cli-finding-classifier): CodeRabbit Major #r3213115045 — eval cou…

b09cf6e

…nt 下限を Bundle i baseline 15 に固定 >=12 だと既存 fixture 削除を検出できないため >= 15 に変更し regression 防止。将来の fixture 追加 (>15) は許容。

aloekun merged commit 17ae756 into master May 9, 2026
1 check passed

aloekun deleted the feature/bundle-i-phase-d-prep branch May 9, 2026 13:20

aloekun mentioned this pull request May 9, 2026

fix(lib-ollama-client): num_ctx を 8192 に拡張 — §8.D 着手 + Bundle i 完了反映 (Phase d 着手前提達成) #136

Merged

4 tasks

aloekun mentioned this pull request May 9, 2026

docs: PR #135/#136 land 反映 — 順位 97 追加 + Phase d gate 解消の stale-doc cleanup #137

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli-finding-classifier, cli-push-runner): Bundle i — Phase d 着手前必須 follow-up (順位 91 + 92 + 93)#135

feat(cli-finding-classifier, cli-push-runner): Bundle i — Phase d 着手前必須 follow-up (順位 91 + 92 + 93)#135
aloekun merged 2 commits intomasterfrom
feature/bundle-i-phase-d-prep

aloekun commented May 9, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 9, 2026 •

edited

Loading

ウォークスルー

変更内容

推定コード審査負荷

関連する可能性のある PR

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

aloekun commented May 9, 2026

Uh oh!

coderabbitai Bot commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aloekun commented May 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Phase d 着手前提条件の充足状況

dogfood 結果 (mistral:7b / temperature=0)

Bundle i fixtures の挙動

既存 fixture (Phase b' 12 件)

agreement 73.3% の解釈

Test plan

Out of scope

Summary by CodeRabbit

リリースノート

Uh oh!

coderabbitai Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ウォークスルー

変更内容

推定コード審査負荷

関連する可能性のある PR

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aloekun commented May 9, 2026

Uh oh!

coderabbitai Bot commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aloekun commented May 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 9, 2026 •

edited

Loading