Skip to content

test: Verify eval workflow runs on PR#6

Closed
saadshahd wants to merge 2 commits intomainfrom
test/verify-eval-workflow
Closed

test: Verify eval workflow runs on PR#6
saadshahd wants to merge 2 commits intomainfrom
test/verify-eval-workflow

Conversation

@saadshahd
Copy link
Copy Markdown
Owner

Purpose

Test that the eval workflow runs correctly now that it exists on main.

Changes

  • Added CI notes to eval/README.md documenting workflow validation behavior

Expected

  • Eval workflow should trigger and run 5 tests
  • Tests may pass or fail (that's fine, we just want to verify the workflow runs)

Document that workflow changes only take effect after merge
due to GitHub's workflow validation security feature.
@claude
Copy link
Copy Markdown

claude bot commented Dec 16, 2025

PR Review

Summary

This PR adds helpful CI documentation to eval/README.md explaining GitHub's workflow validation security feature. The changes are accurate and improve developer experience.

✅ Strengths

  1. Valuable documentation: Addresses a common confusion point about why eval workflows don't run on the PR that introduces them
  2. Clear structure: Follows existing README format with a dedicated "## CI Notes" section
  3. Concise writing: Gets the point across in two bullet points without unnecessary detail
  4. Commit message: Good conventional commit format ("docs(eval): Add CI security notes")

⚠️ Issues Found

1. Missing CHANGELOG.md Entry (Required per CLAUDE.md)

Per the repo conventions in CLAUDE.md:

IMPORTANT: Before any commit, check if CHANGELOG.md needs an entry. If the change is user-facing (new feature, fix, breaking change), add it.

This documentation addition is user-facing and should be tracked. Add to CHANGELOG.md under [Unreleased]:

### Changed
- **eval**: Added CI notes documenting GitHub workflow validation behavior

2. Minor Style Consideration

The wording "First-time workflow" could be clearer. Consider:

- **New workflows**: When adding a workflow via PR, it won't run on that PR due to GitHub security validation. It starts working on subsequent PRs after merge.

📋 Checklist

  • ✅ Code quality: Documentation is clear and accurate
  • ✅ Best practices: Follows existing README structure
  • ✅ Potential bugs: N/A (documentation only)
  • ✅ Performance: N/A (documentation only)
  • ✅ Security: N/A (documentation only)
  • ✅ Test coverage: N/A (documentation only)
  • Changelog updated: Missing entry in CHANGELOG.md

🎯 Recommendation

Request Changes - Please add a CHANGELOG.md entry before merging. Once that's added, this is good to merge.

The documentation itself is solid and will help future contributors understand why their eval workflow changes don't immediately take effect.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Dec 16, 2025

Greptile Overview

Greptile Summary

Added documentation to eval/README.md explaining GitHub's security validation behavior for workflow files. The new CI Notes section clarifies why the eval workflow won't run on the PR that introduces it, helping developers understand expected behavior when adding or modifying workflows.

  • Added CI Notes section with clear explanation of first-time workflow behavior
  • Documented that workflow changes only take effect after merge
  • Follows repository documentation standards

Confidence Score: 5/5

  • This PR is safe to merge with no risk
  • Documentation-only change that adds helpful context about GitHub workflow behavior. No code changes, no logic changes, and the content is accurate and well-formatted.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
eval/README.md 5/5 Added CI Notes section documenting GitHub workflow validation behavior

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant PR as Pull Request
    participant GH as GitHub Actions
    participant Main as Main Branch
    
    Dev->>PR: Create PR with eval workflow
    PR->>GH: Trigger workflow check
    GH-->>PR: Security validation: Skip (new workflow)
    Note over GH,PR: Workflow doesn't run on first PR
    
    Dev->>Main: Merge PR to main
    Main->>GH: Workflow now available
    
    Dev->>PR: Create subsequent PR
    PR->>GH: Trigger eval workflow
    GH->>GH: Run 5 eval tests
    GH-->>PR: Report results
Loading

Copy link
Copy Markdown

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

- Use plugin_marketplaces and plugins inputs instead of claude_args
- Add CRITICAL prompt section to prevent skill simulation
- Install all 5 plugins from local marketplace
@claude
Copy link
Copy Markdown

claude bot commented Dec 16, 2025

PR Review: Verify eval workflow runs on PR

Overview

This PR updates the eval workflow configuration and documentation to clarify CI behavior and improve plugin installation approach.


Strengths

  1. Better Plugin Installation Pattern

    • Switching from --plugin-dir to plugin_marketplaces + plugins list (.github/workflows/eval.yml:61-67) is the correct approach
    • Aligns with how users will actually install plugins via marketplace
    • More realistic testing environment
  2. Critical Clarifications

    • CRITICAL section addition (.github/workflows/eval.yml:49-52) prevents false positives from skill simulation
    • Explicit requirement that skills must actually trigger via Skill tool is essential
  3. Helpful CI Documentation

    • eval/README.md:69-72 documents GitHub's security validation behavior
    • Saves future confusion about why workflow doesn't run on first PR

🔍 Observations & Questions

1. Workflow Trigger Scope

The current paths filter (.github/workflows/eval.yml:5-11) only triggers on plugin directories. Consider:

  • Should changes to .github/workflows/eval.yml itself trigger the workflow?
  • Should changes to eval/cases/ trigger tests?

Recommendation: Add these paths to ensure the workflow validates its own changes:

paths:
  - 'hope/**'
  - 'product/**'
  - 'wordsmith/**'
  - 'founder/**'
  - 'career/**'
  - 'eval/**'
  - '.github/workflows/eval.yml'  # Self-validation

2. Missing CHANGELOG Entry

Per CLAUDE.md:

IMPORTANT: Before any commit, check if CHANGELOG.md needs an entry. If the change is user-facing (new feature, fix, breaking change), add it.

This PR makes a breaking change to the workflow configuration (changing plugin installation method). Should add:

## [Unreleased]

### Changed
- **eval**: Workflow now uses `plugin_marketplaces` instead of `--plugin-dir` for more realistic testing
- **eval**: Added CRITICAL section to eval prompt requiring actual skill triggering
- **eval**: Documented GitHub workflow validation behavior in README

3. Test Coverage

What happens if:

  • A plugin is added to the test matrix but not to the plugins: list?
  • The marketplace.json is malformed?

Consider adding validation step before running tests.


🎯 Code Quality

  • Style: Follows YAML conventions ✅
  • Clarity: Changes are well-documented ✅
  • Correctness: Plugin installation approach is correct ✅

🔒 Security

No security concerns. Changes are configuration-only and use existing GitHub Actions patterns.


📊 Performance

No performance impact. The plugin installation method change may be slightly slower (marketplace resolution) but more accurate for testing.


✏️ Suggested Changes

  1. Add CHANGELOG entry (per CLAUDE.md requirements)
  2. Consider expanding workflow trigger paths (see observation Add Claude Code GitHub Workflow #1)
  3. Add validation step to check plugins list matches test matrix

📝 Verdict

🟡 SHIP + MONITOR

Confidence: 85%

Rationale:

  • Core changes are solid and improve testing accuracy
  • Missing CHANGELOG entry violates repo conventions
  • Workflow trigger scope could be broader for better coverage

Reversible: Yes - configuration change only
Key Assumption: The marketplace plugin resolution works correctly in CI environment
Complexity: Low - straightforward config update


Action Items:

  1. Add CHANGELOG entry before merge
  2. Verify workflow actually runs after merge (as documented in the new CI Notes)
  3. Consider expanding trigger paths in follow-up PR

@saadshahd saadshahd closed this Dec 16, 2025
saadshahd pushed a commit that referenced this pull request Feb 2, 2026
Addresses the core bottleneck: AI output exceeds human verification capacity.

presence skill:
- Surfaces decisions made, assumptions, uncertainties, verification needed
- Triggers on "lost track", "overwhelmed", "what just happened"
- Frequency calibration based on context length
- Integration with soul, gate, trace, shape

verify skill:
- Machine-verifiable acceptance criteria (moo principle #6)
- Phase 1: Lock boolean criteria before building
- Phase 2: Execute verification after building
- agent-browser CLI integration for browser checks
- Vague-to-boolean conversion table

Both skills are [A]ctivation type - Claude knows these concepts but needs
the structured framework to apply them consistently.

https://claude.ai/code/session_01BE49Z9NJRgE43NSVPV4GYq
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant