Skip to content

Add Presidio Evaluator to AI Evaluation Assistant#1905

Open
negruber1 wants to merge 9 commits intoronshakutai/presidio-evaluation-repofrom
noa/add-eval-step
Open

Add Presidio Evaluator to AI Evaluation Assistant#1905
negruber1 wants to merge 9 commits intoronshakutai/presidio-evaluation-repofrom
noa/add-eval-step

Conversation

@negruber1
Copy link
Collaborator

This pull request introduces several improvements and fixes to the AI Assistant evaluation backend and frontend, focusing on dataset portability, evaluation metrics visualization, and dependency management. The most significant changes include making dataset paths portable across machines, enhancing the evaluation UI to display metrics per entity type, and updating dependencies for smoother setup and operation.

Dataset portability and management:

  • Refactored how dataset stored_path and path are saved in the registry to be relative instead of absolute, making dataset references portable across different machines. The loader now gracefully handles legacy absolute paths and attempts to resolve them if not found.
  • Updated the example dataset entry and added a new sample medical records dataset in datasets.json, using relative stored_path values. [1] [2]

Evaluation UI improvements:

  • Enhanced the evaluation page to display metrics per entity type in a table, in addition to overall metrics per config. This includes new UI components and data structures for precision, recall, and F1 scores for each entity type. [1] [2] [3]
  • Replaced the bar chart visualization of overall metrics with a more readable card-based layout per config.
  • Removed unused imports related to the previous charting implementation.

API and dependency updates:

  • Changed the evaluation API call from fetching runs to triggering a new evaluation run, aligning the frontend with backend expectations.
  • Added presidio-evaluator as a backend dependency and updated setup instructions to include the required spaCy model download for local evaluation. [1] [2]

Miscellaneous fixes:

  • Cleaned up package-lock.json by removing unnecessary "peer": true fields from several dependencies. [1] [2] [3] [4] [5] [6] [7] [8]

These changes improve the robustness, usability, and maintainability of the evaluation workflow for AI Assistant.

@negruber1 negruber1 requested a review from a team as a code owner March 12, 2026 09:51
Copy link
Collaborator

@RonShakutai RonShakutai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR !!,
One bug all the rest looks good
Lets take it together next week.

from models import Entity, EntityMiss, EvaluationRun, MissType, RiskLevel
from models import Entity, EntityMiss, MissType, RiskLevel
from presidio_evaluator import InputSample, Span, span_to_tag
from presidio_evaluator.evaluation import EvaluationResult, SpanEvaluator
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is an issue when in the ground truth there are two entities that overlap.
In this case the page is broken.

Image

This raises a question: if the user tags the same area twice and they overlap, the page crashes. Should we limit the user in the UI, or should we resolve it in the backend?
We need to decide.
Image

If I go back to the tagging and make sure there are no overlapping entities in the golden dataset, the error is resolved.
Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants