Add guide for extending AML agent evaluation and datasets by fcogidi · Pull Request #85 · VectorInstitute/eval-agents

fcogidi · 2026-03-24T15:46:18Z

Summary

Add guide for extending AML agent evaluation and datasets.

Clickup Ticket(s): N/A

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📝 Documentation update
🔧 Refactoring (no functional changes)
⚡ Performance improvement
🧪 Test improvements
🔒 Security fix

Changes Made

Add documentation for how to extend the AML agent capabilities and evaluation dimensions.

Testing

Tests pass locally (uv run pytest tests/)
Type checking passes (uv run mypy <src_dir>)
Linting passes (uv run ruff check src_dir/)
Manual testing performed (describe below)

Manual testing details:
N/A

Screenshots/Recordings

Related Issues

Deployment Notes

Checklist

Code follows the project's style guidelines
Self-review of code completed
Documentation updated (if applicable)
No sensitive information (API keys, credentials) exposed

Copilot

Pull request overview

Adds a new documentation guide explaining how the AML investigation evaluation is structured today and where to extend datasets, evaluation dimensions, and agent capabilities.

Changes:

Introduces a comprehensive “Extending AML Agent Evaluation and Datasets” guide for dataset generation, schema evolution, and evaluator additions.
Documents extension points across the AML agent, task wrapper, graders (item/trace/run), and rubrics.
Provides recommended next metrics, tooling ideas, and a validation loop for iterating on datasets/agents/evals.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

implementations/aml_investigation/data/building-aml-eval-datasets.md

…itute#85) * Add guide for extending AML agent evaluation and datasets * Fix typo

Add guide for extending AML agent evaluation and datasets

43527f3

fcogidi self-assigned this Mar 24, 2026

fcogidi added the documentation Improvements or additions to documentation label Mar 24, 2026

fcogidi requested a review from Copilot March 24, 2026 15:47

Copilot started reviewing on behalf of fcogidi March 24, 2026 15:47 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

implementations/aml_investigation/data/building-aml-eval-datasets.md Outdated Show resolved Hide resolved

implementations/aml_investigation/data/building-aml-eval-datasets.md Show resolved Hide resolved

Fix typo

d6387eb

fcogidi merged commit 6cadbae into main Mar 24, 2026
2 checks passed

fcogidi deleted the fco/aml_data_guide branch March 24, 2026 15:55

skaladhar pushed a commit to skaladhar/eval-agents that referenced this pull request Mar 26, 2026

Add guide for extending AML agent evaluation and datasets (VectorInst…

099b357

…itute#85) * Add guide for extending AML agent evaluation and datasets * Fix typo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add guide for extending AML agent evaluation and datasets#85

Add guide for extending AML agent evaluation and datasets#85
fcogidi merged 2 commits intomainfrom
fco/aml_data_guide

fcogidi commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fcogidi commented Mar 24, 2026

Summary

Type of Change

Changes Made

Testing

Screenshots/Recordings

Related Issues

Deployment Notes

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants