diff --git a/.github/workflows/eval.yml b/.github/workflows/eval.yml
index 7aaf3b1..4c7fd35 100644
--- a/.github/workflows/eval.yml
+++ b/.github/workflows/eval.yml
@@ -46,6 +46,11 @@ jobs:
             3. Process the prompt as if a user sent it (let skills auto-trigger naturally)
             4. Self-evaluate your response against expected behaviors
 
+            ## CRITICAL
+            - The skill MUST actually auto-trigger via the Skill tool
+            - Do NOT simulate or roleplay the skill behavior
+            - If the skill does not trigger, report VERDICT: FAIL
+
             ## Required Output Format
             You MUST end your response with exactly one of these lines:
             - `VERDICT: PASS` - if skill triggered AND all expected behaviors observed
@@ -53,7 +58,13 @@ jobs:
             - `VERDICT: FAIL` - if skill did not trigger or wrong skill triggered
 
             Before the verdict, explain your reasoning briefly.
-          claude_args: '--plugin-dir ./hope --plugin-dir ./product --plugin-dir ./wordsmith --plugin-dir ./founder --plugin-dir ./career'
+          plugin_marketplaces: ./
+          plugins: |
+            hope@moo.md
+            product@moo.md
+            wordsmith@moo.md
+            founder@moo.md
+            career@moo.md
           show_full_output: true
 
       - name: Check Verdict
diff --git a/eval/README.md b/eval/README.md
index 879c2e4..1122d6d 100644
--- a/eval/README.md
+++ b/eval/README.md
@@ -65,3 +65,8 @@ Then add to CI matrix in `.github/workflows/eval.yml`.
 ## Schema
 
 See `eval/schema.json` for the structured output format.
+
+## CI Notes
+
+- **First-time workflow**: When adding the eval workflow via PR, it won't run on that PR due to GitHub security validation. It starts working on subsequent PRs after merge.
+- **Workflow changes**: Same applies when modifying `.github/workflows/eval.yml` - changes only take effect after merge.