test: Verify eval workflow runs on PR by saadshahd · Pull Request #6 · saadshahd/moo.md

saadshahd · 2025-12-16T21:09:44Z

Purpose

Test that the eval workflow runs correctly now that it exists on main.

Changes

Added CI notes to eval/README.md documenting workflow validation behavior

Expected

Eval workflow should trigger and run 5 tests
Tests may pass or fail (that's fine, we just want to verify the workflow runs)

Document that workflow changes only take effect after merge due to GitHub's workflow validation security feature.

claude · 2025-12-16T21:11:08Z

PR Review

Summary

This PR adds helpful CI documentation to eval/README.md explaining GitHub's workflow validation security feature. The changes are accurate and improve developer experience.

✅ Strengths

Valuable documentation: Addresses a common confusion point about why eval workflows don't run on the PR that introduces them
Clear structure: Follows existing README format with a dedicated "## CI Notes" section
Concise writing: Gets the point across in two bullet points without unnecessary detail
Commit message: Good conventional commit format ("docs(eval): Add CI security notes")

⚠️ Issues Found

1. Missing CHANGELOG.md Entry (Required per CLAUDE.md)

Per the repo conventions in CLAUDE.md:

IMPORTANT: Before any commit, check if CHANGELOG.md needs an entry. If the change is user-facing (new feature, fix, breaking change), add it.

This documentation addition is user-facing and should be tracked. Add to CHANGELOG.md under [Unreleased]:

### Changed
- **eval**: Added CI notes documenting GitHub workflow validation behavior

2. Minor Style Consideration

The wording "First-time workflow" could be clearer. Consider:

- **New workflows**: When adding a workflow via PR, it won't run on that PR due to GitHub security validation. It starts working on subsequent PRs after merge.

📋 Checklist

✅ Code quality: Documentation is clear and accurate
✅ Best practices: Follows existing README structure
✅ Potential bugs: N/A (documentation only)
✅ Performance: N/A (documentation only)
✅ Security: N/A (documentation only)
✅ Test coverage: N/A (documentation only)
❌ Changelog updated: Missing entry in CHANGELOG.md

🎯 Recommendation

Request Changes - Please add a CHANGELOG.md entry before merging. Once that's added, this is good to merge.

The documentation itself is solid and will help future contributors understand why their eval workflow changes don't immediately take effect.

greptile-apps · 2025-12-16T21:11:31Z

Greptile Overview

Greptile Summary

Added documentation to eval/README.md explaining GitHub's security validation behavior for workflow files. The new CI Notes section clarifies why the eval workflow won't run on the PR that introduces it, helping developers understand expected behavior when adding or modifying workflows.

Added CI Notes section with clear explanation of first-time workflow behavior
Documented that workflow changes only take effect after merge
Follows repository documentation standards

Confidence Score: 5/5

This PR is safe to merge with no risk
Documentation-only change that adds helpful context about GitHub workflow behavior. No code changes, no logic changes, and the content is accurate and well-formatted.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
eval/README.md	5/5	Added CI Notes section documenting GitHub workflow validation behavior

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant PR as Pull Request
    participant GH as GitHub Actions
    participant Main as Main Branch
    
    Dev->>PR: Create PR with eval workflow
    PR->>GH: Trigger workflow check
    GH-->>PR: Security validation: Skip (new workflow)
    Note over GH,PR: Workflow doesn't run on first PR
    
    Dev->>Main: Merge PR to main
    Main->>GH: Workflow now available
    
    Dev->>PR: Create subsequent PR
    PR->>GH: Trigger eval workflow
    GH->>GH: Run 5 eval tests
    GH-->>PR: Report results

greptile-apps

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

- Use plugin_marketplaces and plugins inputs instead of claude_args - Add CRITICAL prompt section to prevent skill simulation - Install all 5 plugins from local marketplace

claude · 2025-12-16T21:15:34Z

PR Review: Verify eval workflow runs on PR

Overview

This PR updates the eval workflow configuration and documentation to clarify CI behavior and improve plugin installation approach.

✅ Strengths

Better Plugin Installation Pattern
- Switching from --plugin-dir to plugin_marketplaces + plugins list (.github/workflows/eval.yml:61-67) is the correct approach
- Aligns with how users will actually install plugins via marketplace
- More realistic testing environment
Critical Clarifications
- CRITICAL section addition (.github/workflows/eval.yml:49-52) prevents false positives from skill simulation
- Explicit requirement that skills must actually trigger via Skill tool is essential
Helpful CI Documentation
- eval/README.md:69-72 documents GitHub's security validation behavior
- Saves future confusion about why workflow doesn't run on first PR

🔍 Observations & Questions

1. Workflow Trigger Scope

The current paths filter (.github/workflows/eval.yml:5-11) only triggers on plugin directories. Consider:

Should changes to .github/workflows/eval.yml itself trigger the workflow?
Should changes to eval/cases/ trigger tests?

Recommendation: Add these paths to ensure the workflow validates its own changes:

paths:
  - 'hope/**'
  - 'product/**'
  - 'wordsmith/**'
  - 'founder/**'
  - 'career/**'
  - 'eval/**'
  - '.github/workflows/eval.yml'  # Self-validation

2. Missing CHANGELOG Entry

Per CLAUDE.md:

IMPORTANT: Before any commit, check if CHANGELOG.md needs an entry. If the change is user-facing (new feature, fix, breaking change), add it.

This PR makes a breaking change to the workflow configuration (changing plugin installation method). Should add:

## [Unreleased]

### Changed
- **eval**: Workflow now uses `plugin_marketplaces` instead of `--plugin-dir` for more realistic testing
- **eval**: Added CRITICAL section to eval prompt requiring actual skill triggering
- **eval**: Documented GitHub workflow validation behavior in README

3. Test Coverage

What happens if:

A plugin is added to the test matrix but not to the plugins: list?
The marketplace.json is malformed?

Consider adding validation step before running tests.

🎯 Code Quality

Style: Follows YAML conventions ✅
Clarity: Changes are well-documented ✅
Correctness: Plugin installation approach is correct ✅

🔒 Security

No security concerns. Changes are configuration-only and use existing GitHub Actions patterns.

📊 Performance

No performance impact. The plugin installation method change may be slightly slower (marketplace resolution) but more accurate for testing.

✏️ Suggested Changes

Add CHANGELOG entry (per CLAUDE.md requirements)
Consider expanding workflow trigger paths (see observation Add Claude Code GitHub Workflow #1)
Add validation step to check plugins list matches test matrix

📝 Verdict

🟡 SHIP + MONITOR

Confidence: 85%

Rationale:

Core changes are solid and improve testing accuracy
Missing CHANGELOG entry violates repo conventions
Workflow trigger scope could be broader for better coverage

Reversible: Yes - configuration change only
Key Assumption: The marketplace plugin resolution works correctly in CI environment
Complexity: Low - straightforward config update

Action Items:

Add CHANGELOG entry before merge
Verify workflow actually runs after merge (as documented in the new CI Notes)
Consider expanding trigger paths in follow-up PR

Addresses the core bottleneck: AI output exceeds human verification capacity. presence skill: - Surfaces decisions made, assumptions, uncertainties, verification needed - Triggers on "lost track", "overwhelmed", "what just happened" - Frequency calibration based on context length - Integration with soul, gate, trace, shape verify skill: - Machine-verifiable acceptance criteria (moo principle #6) - Phase 1: Lock boolean criteria before building - Phase 2: Execute verification after building - agent-browser CLI integration for browser checks - Vague-to-boolean conversion table Both skills are [A]ctivation type - Claude knows these concepts but needs the structured framework to apply them consistently. https://claude.ai/code/session_01BE49Z9NJRgE43NSVPV4GYq

docs(eval): Add CI security notes

56650bc

Document that workflow changes only take effect after merge due to GitHub's workflow validation security feature.

greptile-apps bot reviewed Dec 16, 2025

View reviewed changes

fix(eval): Use marketplace for plugin loading

b683f1e

- Use plugin_marketplaces and plugins inputs instead of claude_args - Add CRITICAL prompt section to prevent skill simulation - Install all 5 plugins from local marketplace

saadshahd closed this Dec 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: Verify eval workflow runs on PR#6

test: Verify eval workflow runs on PR#6
saadshahd wants to merge 2 commits intomainfrom
test/verify-eval-workflow

saadshahd commented Dec 16, 2025

Uh oh!

claude bot commented Dec 16, 2025

Uh oh!

greptile-apps bot commented Dec 16, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

claude bot commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

saadshahd commented Dec 16, 2025

Purpose

Changes

Expected

Uh oh!

claude bot commented Dec 16, 2025

PR Review

Summary

✅ Strengths

⚠️ Issues Found

1. Missing CHANGELOG.md Entry (Required per CLAUDE.md)

2. Minor Style Consideration

📋 Checklist

🎯 Recommendation

Uh oh!

greptile-apps bot commented Dec 16, 2025

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude bot commented Dec 16, 2025

PR Review: Verify eval workflow runs on PR

Overview

✅ Strengths

🔍 Observations & Questions

1. Workflow Trigger Scope

2. Missing CHANGELOG Entry

3. Test Coverage

🎯 Code Quality

🔒 Security

📊 Performance

✏️ Suggested Changes

📝 Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant