Skip to content

feat: add content security scanning and apm audit command#313

Open
danielmeppiel wants to merge 9 commits intomainfrom
feat/content-security-scanner
Open

feat: add content security scanning and apm audit command#313
danielmeppiel wants to merge 9 commits intomainfrom
feat/content-security-scanner

Conversation

@danielmeppiel
Copy link
Collaborator

@danielmeppiel danielmeppiel commented Mar 15, 2026

Content Security Scanner — apm audit + Install-Time Pre-Deploy Gate

Closes #312

Problem

Shared prompt/rules files (.cursorrules, .github/prompts/, etc.) are becoming a de facto supply chain, but without integrity guarantees. Hidden Unicode characters — particularly tag characters (U+E0001–E007F) and bidi overrides — can embed invisible instructions that LLMs tokenize and follow but humans can't see. Unlike npm packages where code sits inert until executed, prompt files are read by IDE agents the moment they land on disk. File presence IS execution.

Solution

Three layers of defense:

  1. Pre-deploy gate during apm install — scans all source files in apm_modules/{pkg}/ BEFORE integrators copy them to targets (.github/, .claude/, etc.). Critical findings block deployment unless --force.

  2. Compile-time scanning — scans compiled output (AGENTS.md, CLAUDE.md, .claude/commands/) before writing to disk. Defense-in-depth: source files were already scanned at install, but the compiled output is what agents actually read.

  3. Pack-time scanning — scans files before bundling with apm pack. Publishing-side gate prevents authors from distributing tainted content.

  4. apm audit command — on-demand scanning of installed packages or arbitrary files.

Features

  • apm audit — scan all installed packages
  • apm audit <package> — scan a specific package
  • apm audit --file .cursorrules — scan any file
  • apm audit --strip — remove non-critical chars (zero-width spaces, unusual whitespace)
  • apm audit --verbose — show info-level findings

Severity Levels

Level Characters Action
Critical Tag chars (U+E0001–E007F), bidi overrides Block install (unless --force)
Warning Zero-width spaces/joiners, mid-file BOM Allow install, record diagnostic
Info Non-breaking spaces, unusual whitespace Allow install, visible with --verbose

Exit Codes

Code Meaning
0 Clean or info-only
1 Critical findings
2 Warnings only (no critical)

Security Hardening

  • Pre-deploy architecture: scan happens before files reach target directories — agents never see compromised files
  • Compile/pack scanning: defense-in-depth for the final files agents read and packages authors publish
  • Symlink protection: is_symlink() checks prevent traversal outside package directory
  • Path traversal guards: _is_safe_lockfile_path() validates all lockfile paths in apm audit
  • _apply_strip() path validation: ensures strip only writes within project root

Performance

  • str.isascii() fast-path: ~5000x faster for pure-ASCII files (90%+ of prompts)
  • Compile/pack scanning adds zero overhead for ASCII content (isascii() check in <1μs)
  • Early termination: stops scanning when critical found and --force not set
  • ContentScanner.classify(): combined has_critical + summarize in single pass
  • O(1) per-character lookup via pre-built _CHAR_LOOKUP dict (156 entries)
  • Typical package (10-50 files): <10ms total scan time

Files

New:

  • src/apm_cli/security/__init__.py + content_scanner.py — pure/stateless scanning engine
  • src/apm_cli/commands/audit.pyapm audit command
  • tests/unit/test_content_scanner.py — 39 scanner tests
  • tests/unit/test_audit_command.py — 33 audit command tests
  • tests/unit/test_install_scanning.py — install scanning tests

Modified:

  • src/apm_cli/commands/install.py_pre_deploy_security_scan() wired into all 3 install paths
  • src/apm_cli/commands/compile.py — compile-time output scanning
  • src/apm_cli/compilation/agents_compiler.py — CLAUDE.md output scanning
  • src/apm_cli/compilation/claude_formatter.py — commands output scanning
  • src/apm_cli/bundle/packer.py — pack-time input scanning
  • src/apm_cli/utils/diagnostics.pyseverity field, security() method, has_critical_security
  • src/apm_cli/utils/console.pymarkup=False fix for Rich [i] parsing
  • docs/src/content/docs/enterprise/security.md — rewritten with threat model, three scanning gates
  • docs/src/content/docs/enterprise/governance.md — trimmed duplicates, cross-linked
  • docs/src/content/docs/reference/cli-commands.mdapm audit, compile/pack scanning docs

Future work (tracked)

All 1853 unit tests passing.

Copilot AI review requested due to automatic review settings March 15, 2026 07:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds supply-chain style content integrity scanning to APM to detect hidden/invisible Unicode characters in prompt/rules files, exposed both via a new apm audit command and as install-time diagnostics.

Changes:

  • Introduces a dependency-free ContentScanner plus install-time wiring through BaseIntegrator.scan_deployed_files() and DiagnosticCollector’s new security category.
  • Adds apm audit command with --file, --fix, and --verbose modes and corresponding unit tests.
  • Updates docs and changelog to document the new security scanning and command behavior.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/apm_cli/security/content_scanner.py Implements the hidden-Unicode scanner and non-critical stripping helper.
src/apm_cli/security/__init__.py Exposes scanner types for import ergonomics.
src/apm_cli/integration/base_integrator.py Adds install-time scanning hook (scan_deployed_files).
src/apm_cli/utils/diagnostics.py Adds security category, counting/flags, and rendering group.
src/apm_cli/commands/install.py Wires scanning after integration and tags findings with package_name.
src/apm_cli/commands/audit.py Implements new apm audit CLI command with lockfile and --file modes.
src/apm_cli/cli.py Registers the new audit command.
tests/unit/test_content_scanner.py Unit tests for scanner detection, positioning, and stripping behavior.
tests/unit/test_audit_command.py Unit tests for apm audit modes, exit codes, and --fix.
tests/unit/test_install_scanning.py Unit tests for install-time scanning/diagnostics integration.
docs/src/content/docs/reference/cli-commands.md Documents apm audit usage, options, and exit codes.
docs/src/content/docs/enterprise/security.md Documents content scanning as a supply-chain mitigation.
docs/src/content/docs/enterprise/governance.md Adds governance guidance for apm audit and install-time scanning.
CHANGELOG.md Adds Unreleased entries describing audit/scanning features.
Comments suppressed due to low confidence (1)

src/apm_cli/commands/audit.py:347

  • Exit-code selection treats any findings (including info-level) as non-clean (sys.exit(2) when no critical findings). If info-level findings are intended to be non-failing, the exit logic should distinguish warnings vs info. If info-level findings should still return 2, the messaging should clearly state that exit code 2 includes info-only cases (and _render_summary() should reflect that).
    # -- Exit code --
    if not findings_by_file:
        sys.exit(0)

    all_findings = [f for ff in findings_by_file.values() for f in ff]
    if ContentScanner.has_critical(all_findings):
        sys.exit(1)
    sys.exit(2)

@danielmeppiel danielmeppiel force-pushed the feat/content-security-scanner branch 3 times, most recently from f6db857 to 5b91e46 Compare March 15, 2026 09:13
Add supply chain integrity scanning for prompt files — detects hidden
Unicode characters (tag characters, bidi overrides, zero-width chars)
that can embed invisible instructions in shared rules files.

New features:
- apm audit: scan installed packages for hidden Unicode characters
- apm audit --file: scan arbitrary files (gateway for non-APM users)
- apm audit --fix: auto-strip non-critical characters
- Install-time scanning: apm install surfaces findings in diagnostics

Architecture:
- Pure stateless ContentScanner with O(1) per-character lookup
- Three severity levels: critical, warning, info
- Security category in DiagnosticCollector
- BaseIntegrator.scan_deployed_files() bridge method

72 new tests across 3 test files (scanner, audit command, install scanning).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds hidden-Unicode content integrity scanning to APM, exposing it via a new apm audit command and integrating a pre-deployment scan into apm install, with findings surfaced through diagnostics and documented in the site docs.

Changes:

  • Introduce a dependency-free ContentScanner that detects suspicious Unicode characters and can optionally strip non-critical ones.
  • Add apm audit (lockfile-mode + --file, plus --strip and verbose output) and wire pre-deploy scanning into apm install.
  • Extend diagnostics with a new security category (rendered first), plus tests and documentation updates.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/apm_cli/security/content_scanner.py Implements the core scanner, findings model, and strip helper.
src/apm_cli/security/__init__.py Exposes scanner types from the new security package.
src/apm_cli/commands/audit.py Adds the apm audit CLI command and related helpers for scanning/stripping.
src/apm_cli/commands/install.py Adds a pre-deployment security gate to block critical hidden characters unless --force.
src/apm_cli/integration/base_integrator.py Adds scan_deployed_files() helper for post-integration scanning into diagnostics.
src/apm_cli/utils/diagnostics.py Adds CATEGORY_SECURITY, security() recording, and security-first rendering.
src/apm_cli/cli.py Registers the new audit command.
tests/unit/test_content_scanner.py Unit tests for scanner detection, positions, and stripping.
tests/unit/test_audit_command.py CLI tests for apm audit modes, exit codes, filtering, and --strip.
tests/unit/test_install_scanning.py Tests pre-deploy scan gating + deployed-file scan diagnostics behavior.
docs/src/content/docs/reference/cli-commands.md Documents the new apm audit command and install scanning behavior.
docs/src/content/docs/enterprise/security.md Expands enterprise security model to include content scanning and threat model.
docs/src/content/docs/enterprise/governance.md Adds governance workflow section for apm audit and planned CI mode.
CHANGELOG.md Adds an Unreleased changelog entry for content security scanning/audit.
Comments suppressed due to low confidence (1)

src/apm_cli/commands/install.py:589

  • PR description mentions install-time scanning of deployed files via BaseIntegrator.scan_deployed_files(), but the install flow shown here doesn’t call it after integrating primitives (only the pre-deploy scan runs). If per-deployed-file diagnostics are intended, wire scan_deployed_files() into the integration pipeline.
def _integrate_package_primitives(
    package_info,
    project_root,
    *,
    integrate_vscode,

- Add str.isascii() fast-path to scan_text() (~5000x faster for ASCII files)
- Early termination in _pre_deploy_security_scan() when critical found (not force)
- Add ContentScanner.classify() combining has_critical + summarize in one pass
- Fix symlink following in _pre_deploy_security_scan and scan_deployed_files
- Fix Rich markup parsing of [i]/[!] in _rich_echo (markup=False)
- Fix exit code docs wording in governance.md and cli-commands.md

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
danielmeppiel and others added 2 commits March 15, 2026 16:25
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new content security scanning capability to APM to detect hidden Unicode characters in prompt/rules files, exposed via a new apm audit command and enforced as an install-time pre-deploy gate.

Changes:

  • Introduces a dependency-free ContentScanner engine with severity classification and optional stripping of non-critical characters.
  • Adds apm audit (lockfile scanning + --file, --strip, --verbose) and wires install-time scanning to block critical findings unless --force.
  • Extends diagnostics to support a first-class security category with typed severity and updated rendering, plus docs/tests coverage.

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/apm_cli/security/content_scanner.py New scanning engine for suspicious Unicode characters + strip support
src/apm_cli/security/__init__.py Exposes scanner API from the security package
src/apm_cli/commands/audit.py New apm audit command (lockfile + file modes, strip, rendering, exit codes)
src/apm_cli/commands/install.py Adds _pre_deploy_security_scan() gate and wires it into install flows
src/apm_cli/integration/base_integrator.py Adds scan_deployed_files() to push security findings into diagnostics
src/apm_cli/utils/diagnostics.py Adds CATEGORY_SECURITY, per-item severity, and security rendering/query helpers
src/apm_cli/utils/console.py Adjusts Rich printing to avoid unintended markup parsing
src/apm_cli/cli.py Registers the new audit command
tests/unit/test_content_scanner.py Unit tests for scanner detection, positioning, summarize/classify, stripping
tests/unit/test_audit_command.py Unit tests for CLI behavior, exit codes, strip behavior, lockfile scanning
tests/unit/test_install_scanning.py Unit tests for install-time gate + deployed-file scanning diagnostics
docs/src/content/docs/reference/cli-commands.md Documents apm audit and install-time scanning / --force behavior
docs/src/content/docs/enterprise/security.md Updates threat model + security posture explanation for content scanning
docs/src/content/docs/enterprise/governance.md Adds governance guidance for apm audit usage + exit codes
CHANGELOG.md Adds Unreleased entry for the feature
.copilot/session-state/.../plan.md Adds a Copilot session plan document (likely unintended repo artifact)

danielmeppiel and others added 2 commits March 15, 2026 17:11
- Remove dead scan_deployed_files() from BaseIntegrator (orphaned by
  pre-deploy shift, zero callers)
- Add content scanning to apm compile output (AGENTS.md, CLAUDE.md,
  commands) before writing to disk — defense-in-depth using isascii()
  fast-path
- Add content scanning to apm pack before bundling — publishing-side
  check warns on hidden characters
- Update security docs: document three scanning gates (install, compile,
  pack) and planned hardening roadmap
- Update CLI commands docs: compile and pack scanning behavior
- Replace dead-code tests with ContentScanner-based equivalents

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…lution

- Replace rglob('*') with os.walk(followlinks=False) in both
  _pre_deploy_security_scan() and _scan_files_in_dir() to prevent
  traversal into symlinked directories outside the package tree
- Reword early-termination diagnostic to 'at least N critical' since
  the scan may have stopped before counting all findings
- Resolve --file paths to absolute before keying findings so that
  --strip works correctly with relative paths like ../some.md
- Add test proving symlinked directories are not followed

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a content security scanning layer to APM to detect hidden/invisible Unicode characters in prompt/rules files, with enforcement during install and on-demand via a new apm audit command.

Changes:

  • Introduces a dependency-free ContentScanner engine plus an apm audit CLI command (file-mode and lockfile-mode, optional --strip, exit codes).
  • Wires scanning into apm install as a pre-deploy gate (block on critical unless --force) and adds defense-in-depth scanning in compile/pack flows.
  • Extends DiagnosticCollector with a security category + severity support, and updates docs/changelog accordingly.

Reviewed changes

Copilot reviewed 19 out of 20 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/unit/test_install_scanning.py Unit tests for install-time pre-deploy scanning gate + diagnostics behavior
tests/unit/test_content_scanner.py Unit tests for scanner character classification, positioning, and stripping
tests/unit/test_audit_command.py Unit tests for apm audit behavior, exit codes, lockfile scanning, and --strip
src/apm_cli/utils/diagnostics.py Adds security diagnostics category with typed severity + rendering
src/apm_cli/utils/console.py Adjusts Rich printing to avoid unintended markup parsing
src/apm_cli/security/content_scanner.py New scanning engine for hidden Unicode detection + stripping
src/apm_cli/security/init.py Exposes scanner symbols via package exports
src/apm_cli/compilation/claude_formatter.py Scans generated Claude commands before writing
src/apm_cli/compilation/agents_compiler.py Scans generated CLAUDE.md output before writing
src/apm_cli/commands/install.py Adds _pre_deploy_security_scan() gate and wires it into install paths
src/apm_cli/commands/compile.py Scans compiled outputs before writing to disk
src/apm_cli/commands/audit.py New apm audit command implementation + helpers
src/apm_cli/cli.py Registers the new audit command
src/apm_cli/bundle/packer.py Scans files before bundling during apm pack
docs/src/content/docs/reference/cli-commands.md Documents apm audit and scanning behavior in install/compile/pack
docs/src/content/docs/enterprise/security.md Updates enterprise security model with scanning gates + threat model
docs/src/content/docs/enterprise/governance.md Adds governance guidance for apm audit workflows/exit codes
README.md Mentions content security + apm audit in feature list
CHANGELOG.md Adds Unreleased entry for the feature set
.gitignore Ignores .copilot/ directory

danielmeppiel and others added 2 commits March 15, 2026 17:33
- packer.py: Replace logging.warning with _rich_warning so pack-time
  scan findings are visible to users
- agents_compiler.py: Replace logging.warning with all_warnings.append
  so CLAUDE.md findings surface through CompilationFormatter
- claude_formatter.py: Hoist ContentScanner import outside write loop
- compile.py: Remove unused has_crit variable from classify() call
- install.py: Remove duplicate installed_packages.append in blocked
  local-package path (was already appended at line 1291)
- test_install_scanning.py: Add try/except OSError pytest.skip guard
  for symlink test on platforms that don't support symlinks

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- agents_compiler.py: Move ContentScanner import before the
  content_map write loop (was inside loop, flagged pattern)
- test_content_scanner.py: Add latin-1 encoding and BOM+critical
  combo edge case tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Content security scanning for prompt files (hidden Unicode detection)

2 participants