Add 13 German country-specific predefined PII PatternRecognizers and…#1830
Add 13 German country-specific predefined PII PatternRecognizers and…#1830shokrydev wants to merge 12 commits intomicrosoft:mainfrom
Conversation
…their tests. Includes supported_entities.md and default_recognizers.yaml update. References: - Fork issue: Closes #1 - Upstream issue: Closes microsoft#1828
|
@microsoft-github-policy-service agree |
There was a problem hiding this comment.
Pull request overview
This pull request adds 13 German country-specific PII (Personally Identifiable Information) recognizers to Presidio, addressing issue #1828. The implementation includes pattern-based recognizers with checksum validation where applicable, comprehensive test coverage, and proper documentation updates.
Changes:
- Added 13 new German PII recognizers in
presidio-analyzer/presidio_analyzer/predefined_recognizers/country_specific/germany/with pattern matching and validation logic - Added corresponding test files with comprehensive test cases covering valid/invalid formats, edge cases, and initialization tests
- Updated configuration files (
default_recognizers.yaml,__init__.py) to register the new recognizers - Updated
supported_entities.mddocumentation with descriptions of all new German entity types
Reviewed changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
de_bsnr_recognizer.py |
Facility number recognizer with KV regional code validation |
de_commercial_register_recognizer.py |
Commercial register number (HRA/HRB) recognizer |
de_driver_license_recognizer.py |
Driver's license recognizer with alphanumeric validation |
de_kvnr_recognizer.py |
Health insurance number recognizer with modified Luhn checksum |
de_lanr_recognizer.py |
Physician number recognizer with checksum validation |
de_license_plate_recognizer.py |
Vehicle license plate recognizer |
de_passport_recognizer.py |
Passport number recognizer with checksum validation |
de_personal_id_recognizer.py |
Personal ID card recognizer with checksum validation |
de_postal_code_recognizer.py |
5-digit postal code recognizer |
de_social_security_recognizer.py |
Social security number with complex checksum per VKVV § 2 |
de_tax_id_recognizer.py |
Tax ID recognizer with ISO 7064 MOD 11,10 checksum |
de_telematik_id_recognizer.py |
Healthcare IT infrastructure identifier |
de_vat_code_recognizer.py |
VAT number recognizer (DE + 9 digits format) |
test_de_*.py (13 files) |
Comprehensive test suites for each recognizer |
__init__.py (2 files) |
Import and export declarations for new recognizers |
default_recognizers.yaml |
Configuration entries (disabled by default) |
supported_entities.md |
Documentation table with all 13 German entity types |
...idio_analyzer/predefined_recognizers/country_specific/germany/de_license_plate_recognizer.py
Outdated
Show resolved
Hide resolved
...esidio_analyzer/predefined_recognizers/country_specific/germany/de_postal_code_recognizer.py
Outdated
Show resolved
Hide resolved
.../presidio_analyzer/predefined_recognizers/country_specific/germany/de_passport_recognizer.py
Outdated
Show resolved
Hide resolved
...esidio_analyzer/predefined_recognizers/country_specific/germany/de_personal_id_recognizer.py
Outdated
Show resolved
Hide resolved
…LZ documentation Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…sport validation order bug and update tests
Merge upstream branch 'main' into feat/german-patternrecognizers
omri374
left a comment
There was a problem hiding this comment.
Thanks! apologies for the delayed review.
| "umsatzsteuer-id", | ||
| "umsatzsteuerid", | ||
| "vat number", | ||
| "vat id", |
There was a problem hiding this comment.
consider adding "vat" too. The current context aware mechanism works on unigrams and not on sentences, so "vat id" would not be caught.
|
@shokrydev thanks for this comprehensive addition! Please see the minor comment + CI failures. |
|
@shokrydev are you planning on continuing this? Thanks! |
… their tests. Includes supported_entities.md and default_recognizers.yaml update.
Change Description
Added 13 German country-specific predefined PII PatternRecognizers and their tests
Issue reference
Closes #1828
Checklist