Skip to content

Add 13 German country-specific predefined PII PatternRecognizers and…#1830

Open
shokrydev wants to merge 12 commits intomicrosoft:mainfrom
shokrydev:feat/german-patternrecognizers
Open

Add 13 German country-specific predefined PII PatternRecognizers and…#1830
shokrydev wants to merge 12 commits intomicrosoft:mainfrom
shokrydev:feat/german-patternrecognizers

Conversation

@shokrydev
Copy link

@shokrydev shokrydev commented Jan 11, 2026

… their tests. Includes supported_entities.md and default_recognizers.yaml update.

Change Description

Added 13 German country-specific predefined PII PatternRecognizers and their tests

  • de_passport_recognizer.py
  • de_commercial_register_recognizer.py
  • de_driver_license_recognizer.py
  • de_vat_code_recognizer.py
  • de_bsnr_recognizer.py
  • de_tax_id_recognizer.py
  • de_social_security_recognizer.py
  • de_personal_id_recognizer.py
  • de_telematik_id_recognizer.py
  • de_license_plate_recognizer.py
  • de_postal_code_recognizer.py
  • de_kvnr_recognizer.py
  • de_lanr_recognizer.py

Issue reference

Closes #1828

Checklist

  • I have reviewed the contribution guidelines
  • I have signed the CLA (if required)
  • My code includes unit tests
  • All unit tests and lint checks pass locally
  • My PR contains documentation updates / additions if required

…their tests. Includes supported_entities.md and default_recognizers.yaml update.

References:
- Fork issue: Closes #1
- Upstream issue: Closes microsoft#1828
@shokrydev
Copy link
Author

@microsoft-github-policy-service agree

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds 13 German country-specific PII (Personally Identifiable Information) recognizers to Presidio, addressing issue #1828. The implementation includes pattern-based recognizers with checksum validation where applicable, comprehensive test coverage, and proper documentation updates.

Changes:

  • Added 13 new German PII recognizers in presidio-analyzer/presidio_analyzer/predefined_recognizers/country_specific/germany/ with pattern matching and validation logic
  • Added corresponding test files with comprehensive test cases covering valid/invalid formats, edge cases, and initialization tests
  • Updated configuration files (default_recognizers.yaml, __init__.py) to register the new recognizers
  • Updated supported_entities.md documentation with descriptions of all new German entity types

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
de_bsnr_recognizer.py Facility number recognizer with KV regional code validation
de_commercial_register_recognizer.py Commercial register number (HRA/HRB) recognizer
de_driver_license_recognizer.py Driver's license recognizer with alphanumeric validation
de_kvnr_recognizer.py Health insurance number recognizer with modified Luhn checksum
de_lanr_recognizer.py Physician number recognizer with checksum validation
de_license_plate_recognizer.py Vehicle license plate recognizer
de_passport_recognizer.py Passport number recognizer with checksum validation
de_personal_id_recognizer.py Personal ID card recognizer with checksum validation
de_postal_code_recognizer.py 5-digit postal code recognizer
de_social_security_recognizer.py Social security number with complex checksum per VKVV § 2
de_tax_id_recognizer.py Tax ID recognizer with ISO 7064 MOD 11,10 checksum
de_telematik_id_recognizer.py Healthcare IT infrastructure identifier
de_vat_code_recognizer.py VAT number recognizer (DE + 9 digits format)
test_de_*.py (13 files) Comprehensive test suites for each recognizer
__init__.py (2 files) Import and export declarations for new recognizers
default_recognizers.yaml Configuration entries (disabled by default)
supported_entities.md Documentation table with all 13 German entity types

omri374
omri374 previously approved these changes Feb 12, 2026
Copy link
Collaborator

@omri374 omri374 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! apologies for the delayed review.

"umsatzsteuer-id",
"umsatzsteuerid",
"vat number",
"vat id",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider adding "vat" too. The current context aware mechanism works on unigrams and not on sentences, so "vat id" would not be caught.

@omri374
Copy link
Collaborator

omri374 commented Feb 16, 2026

@shokrydev thanks for this comprehensive addition! Please see the minor comment + CI failures.

@SharonHart
Copy link
Contributor

@shokrydev are you planning on continuing this? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add German country-specific predefined recognizers

4 participants