AI Agent system for extracting billing rules from healthcare policy PDFs and generating SQL implementations.
policyAgent is a multi-agent pipeline that:
- Parses PDF policy documents using PaddleOCR (ONNX)
- Analyzes content to identify billing rules
- Generates SQL queries to enforce rules
- Executes queries against claims database to find violations
- Validates rules against industry standards via web search
- Produces HTML reports with violations and confidence scores
PDF → Parser → Analyzer → SQLGen → Scorer → QueryExecutor → Reporter → HTML
│ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼
PDF+OCR LLM SQL+LLM WebSearch ClaimsDB HTML
# Clone repository
git clone https://github.com/RoniLeor/policyAgent.git
cd policyAgent
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
# Install package with dev dependencies
pip install -e ".[dev]"
# Setup pre-commit hooks (required for contributors)
pre-commit install# Initialize claims database with sample data
policyagent init-claims --db claims.db
# Process a policy PDF (mock mode - no API keys needed)
policyagent process policy.pdf report.html --mock
# Process with claims database to find violations
policyagent process policy.pdf report.html --claims-db claims.db --mock
# Search indexed rules
policyagent search --cpt 69990
policyagent search --type mutual_exclusion
policyagent search --text "microsurgery"
# View database statistics
policyagent statsimport asyncio
from policyagent.orchestrator.pipeline import Pipeline
from policyagent.storage.claims_db import ClaimsDatabase
# Setup claims database
claims_db = ClaimsDatabase("claims.db")
claims_db.load_sample_data()
# Create pipeline with claims database
pipeline = Pipeline(claims_db=claims_db)
# Process a policy document
report = asyncio.run(pipeline.run(
pdf_path="policy.pdf",
output_path="report.html"
))
print(f"Found {report.total_violations} violations")Create a .env file:
# LLM Provider (openai or anthropic)
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o
# Or use Anthropic
# LLM_PROVIDER=anthropic
# ANTHROPIC_API_KEY=sk-ant-...
# ANTHROPIC_MODEL=claude-sonnet-4-20250514
# OCR (optional, uses defaults)
OCR_USE_GPU=false
PDF_DPI=300policyAgent/
├── src/policyagent/
│ ├── config/ # Configuration and settings
│ ├── core/ # Base classes, models, LLM clients
│ ├── tools/ # PDF, OCR, SQL, WebSearch, HTML tools
│ ├── agents/ # Parser, Analyzer, SQLGen, Scorer, Reporter
│ ├── orchestrator/ # Pipeline coordination
│ ├── storage/ # SQLite repositories (rules, claims)
│ └── templates/ # Jinja2 HTML templates
├── tests/ # Test suite
└── models/ # ONNX models (auto-downloaded)
| Type | Description | Example |
|---|---|---|
mutual_exclusion |
Services cannot appear together | CPT 69990 requires specific primary procedures |
overutilization |
Units limited over time period | Max 4 units per day for CPT 97110 |
service_not_covered |
Service not covered in plan | Cosmetic procedures excluded |
patient(patient_id, dob, gender)
provider(npi, tin)
claim(claim_id, patient_id, provider_npi)
claim_line(claim_id, dos, pos, icd10, cpt_code, units, amount, modifiers)# Install dev dependencies
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit installThe project uses pre-commit hooks for code quality:
- Ruff - Linting and formatting
- detect-secrets - Prevent committing secrets/API keys
- Bandit - Security vulnerability scanning
- Pyright - Type checking
# Run all hooks manually
pre-commit run --all-files
# Run specific hook
pre-commit run ruff --all-files
pre-commit run detect-secrets --all-files# Linting
ruff check src/ tests/
ruff check src/ tests/ --fix # Auto-fix
# Formatting
ruff format src/ tests/
# Type checking
pyright src/
# Security scan
bandit -r src/
# Run tests
pytest
# Run tests with coverage
pytest --cov=policyagent --cov-report=html
# Run all checks (recommended before commit)
ruff check src/ tests/ && ruff format --check src/ tests/ && pyright src/ && pytest# Before committing - hooks run automatically
git add .
git commit -m "feat: add new feature"
# If hooks fail, fix issues and retry
ruff check src/ --fix
git add .
git commit -m "feat: add new feature"
# Push changes
git push origin mainThe generated HTML report includes:
- Summary stats: Total rules, violations found, pages processed
- Rule details: Name, description, classification, CPT codes
- SQL implementation: Formatted query with syntax highlighting
- Confidence score: Visual bar with percentage
- Violations table: Actual claims that violate each rule
- Validation sources: Links to CMS guidelines and references
# Run with mock LLM (no API keys needed)
policyagent process tests/fixtures/test_policy.pdf report.html --mock
# Run with claims database
policyagent init-claims
policyagent process tests/fixtures/test_policy.pdf report.html --mock --claims-db claims.dbMIT