Add Presidio Evaluator to AI Evaluation Assistant by negruber1 · Pull Request #1905 · microsoft/presidio

negruber1 · 2026-03-12T09:51:12Z

This pull request introduces several improvements and fixes to the AI Assistant evaluation backend and frontend, focusing on dataset portability, evaluation metrics visualization, and dependency management. The most significant changes include making dataset paths portable across machines, enhancing the evaluation UI to display metrics per entity type, and updating dependencies for smoother setup and operation.

Dataset portability and management:

Refactored how dataset stored_path and path are saved in the registry to be relative instead of absolute, making dataset references portable across different machines. The loader now gracefully handles legacy absolute paths and attempts to resolve them if not found.
Updated the example dataset entry and added a new sample medical records dataset in datasets.json, using relative stored_path values. [1] [2]

Evaluation UI improvements:

Enhanced the evaluation page to display metrics per entity type in a table, in addition to overall metrics per config. This includes new UI components and data structures for precision, recall, and F1 scores for each entity type. [1] [2] [3]
Replaced the bar chart visualization of overall metrics with a more readable card-based layout per config.
Removed unused imports related to the previous charting implementation.

API and dependency updates:

Changed the evaluation API call from fetching runs to triggering a new evaluation run, aligning the frontend with backend expectations.
Added presidio-evaluator as a backend dependency and updated setup instructions to include the required spaCy model download for local evaluation. [1] [2]

Miscellaneous fixes:

Cleaned up package-lock.json by removing unnecessary "peer": true fields from several dependencies. [1] [2] [3] [4] [5] [6] [7] [8]

These changes improve the robustness, usability, and maintainability of the evaluation workflow for AI Assistant.

RonShakutai

Great PR !!,
One bug all the rest looks good
Lets take it together next week.

RonShakutai · 2026-03-12T19:11:44Z

evaluation/ai-assistant/backend/routers/evaluation.py

-from models import Entity, EntityMiss, EvaluationRun, MissType, RiskLevel
+from models import Entity, EntityMiss, MissType, RiskLevel
+from presidio_evaluator import InputSample, Span, span_to_tag
+from presidio_evaluator.evaluation import EvaluationResult, SpanEvaluator


there is an issue when in the ground truth there are two entities that overlap.
In this case the page is broken.

This raises a question: if the user tags the same area twice and they overlap, the page crashes. Should we limit the user in the UI, or should we resolve it in the backend?
We need to decide.

If I go back to the tagging and make sure there are no overlapping entities in the golden dataset, the error is resolved.

negruber1 added 7 commits March 12, 2026 01:09

added presidio evaluator

690bf64

evaluation bug fixes

16f6978

merge conflicts resolved

6dba753

merge conflicts resolved

af569b4

evaluation bug fix

c44bce3

evaluation bug fixes

bd8eaaf

metrics fix

e84bda0

negruber1 requested a review from a team as a code owner March 12, 2026 09:51

negruber1 added 2 commits March 12, 2026 12:26

ui fixes

e96aef5

metrics added

004b7e9

RonShakutai requested changes Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Presidio Evaluator to AI Evaluation Assistant#1905

Add Presidio Evaluator to AI Evaluation Assistant#1905
negruber1 wants to merge 9 commits intoronshakutai/presidio-evaluation-repofrom
noa/add-eval-step

negruber1 commented Mar 12, 2026

Uh oh!

RonShakutai left a comment

Uh oh!

RonShakutai Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

negruber1 commented Mar 12, 2026

Uh oh!

RonShakutai left a comment

Choose a reason for hiding this comment

Uh oh!

RonShakutai Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants