Skip to content

Add guide for extending AML agent evaluation and datasets#85

Merged
fcogidi merged 2 commits intomainfrom
fco/aml_data_guide
Mar 24, 2026
Merged

Add guide for extending AML agent evaluation and datasets#85
fcogidi merged 2 commits intomainfrom
fco/aml_data_guide

Conversation

@fcogidi
Copy link
Collaborator

@fcogidi fcogidi commented Mar 24, 2026

Summary

Add guide for extending AML agent evaluation and datasets.

Clickup Ticket(s): N/A

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📝 Documentation update
  • 🔧 Refactoring (no functional changes)
  • ⚡ Performance improvement
  • 🧪 Test improvements
  • 🔒 Security fix

Changes Made

  • Add documentation for how to extend the AML agent capabilities and evaluation dimensions.

Testing

  • Tests pass locally (uv run pytest tests/)
  • Type checking passes (uv run mypy <src_dir>)
  • Linting passes (uv run ruff check src_dir/)
  • Manual testing performed (describe below)

Manual testing details:
N/A

Screenshots/Recordings

Related Issues

Deployment Notes

Checklist

  • Code follows the project's style guidelines
  • Self-review of code completed
  • Documentation updated (if applicable)
  • No sensitive information (API keys, credentials) exposed

@fcogidi fcogidi self-assigned this Mar 24, 2026
@fcogidi fcogidi added the documentation Improvements or additions to documentation label Mar 24, 2026
@fcogidi fcogidi requested a review from Copilot March 24, 2026 15:47
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new documentation guide explaining how the AML investigation evaluation is structured today and where to extend datasets, evaluation dimensions, and agent capabilities.

Changes:

  • Introduces a comprehensive “Extending AML Agent Evaluation and Datasets” guide for dataset generation, schema evolution, and evaluator additions.
  • Documents extension points across the AML agent, task wrapper, graders (item/trace/run), and rubrics.
  • Provides recommended next metrics, tooling ideas, and a validation loop for iterating on datasets/agents/evals.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@fcogidi fcogidi merged commit 6cadbae into main Mar 24, 2026
2 checks passed
@fcogidi fcogidi deleted the fco/aml_data_guide branch March 24, 2026 15:55
skaladhar pushed a commit to skaladhar/eval-agents that referenced this pull request Mar 26, 2026
…itute#85)

* Add guide for extending AML agent evaluation and datasets

* Fix typo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants