Skip to content

Measured boot attestation with UKI policy#35

Merged
phaer merged 39 commits intomainfrom
mb-policy
Apr 15, 2026
Merged

Measured boot attestation with UKI policy#35
phaer merged 39 commits intomainfrom
mb-policy

Conversation

@phaer
Copy link
Copy Markdown
Owner

@phaer phaer commented Mar 25, 2026

Switch from raw PCR digest comparison to UEFI event log validation. A custom UKI keylime policy parses the TPMs event log and checks individual events instead of
comparing PCR hashes. This means firmware config changes (boot order, BIOS settings) no longer break attestation as they can be ignored and we have a better change to understand why something changes if it does

Keylime patches:

Other:

  • nixpkgs bump to nixos-unstable + systemd 259 fixes (TPM2 LUKS, PCR 9 NvPCR)
  • Restore explicit PCR 7 binding for LUKS partitions
  • Removed old pcr-policy package (superseded)

Trust model

Agent self-reports its event log on first boot (TOFU). After that, the verifier replays the log against the refstate and validates the TPM quote every cycle. PCR 9 and 11 are NOT in the event log replay (because systemd-pcrphase extends them at runtime) but 11 is still in the TPM quote, while 9 should redundant in our case (UKI boot) - value consistency is enforced across attestations.

@phaer phaer changed the title Mb policy Measured boot attestation with UKI policy Mar 25, 2026
phaer added 29 commits April 13, 2026 14:28
Add logLevelOverrides option to suppress noisy per-request INFO
logging from keylime.web and keylime.authorization.manager.
Add libefivar for event log enrichment, enable the measured boot
policy, send refstate from agent to auto-enrollment server, and
update tests for measured boot enrollment.
Custom MBA policy for UKI boot chains (systemd-boot + UKI).
Validates SCRTM/firmware (PCR 0), Secure Boot keys (PCR 7),
UKI application digest (PCR 4), and UKI PE section
measurements (PCR 11). Accepts expected variability in PCR 1.

Includes create-uki-refstate tool to generate the reference
state from a binary UEFI event log.
Add measuredBootPolicyPath option to specify the directory
containing the MBA policy Python module.
Extract event log parsing into a shared Python library.
Rename packages for consistency:
  keylime-uki-policy -> keylime-measured-boot-policy
  pcr-policy -> measured-boot-state
Add libefivar to measure-boot-state wrapper for device path decoding.
Replays the UEFI event log, compares PCRs against the TPM,
and diffs refstates to diagnose attestation mismatches.
18 policy tests + 21 library tests, run as nix flake checks.
Apply 4 upstream patches (3 keylime, 1 rust-keylime):
- Check tpm2_eventlog exit code instead of stderr
- Use policy's get_relevant_pcrs() for PCR replay
- Bypass ORM cache for uefi_ref_state
- Cache UEFI event log bytes at agent startup

Set measured_boot_evaluate=always and add negative attestation
test (tampered UKI digest is rejected).
Drop cached measured boot reports for agents that are no longer
registered.  Without this, removing an agent and rebooting with a
new image can race: the daemon re-enrolls with the old refstate
before the fresh report arrives, leaving the agent stuck with a
wrong UKI digest.
Commands:
- status:  list agents with registrar/verifier enrollment state
- inspect: show detailed agent info including refstate summary
- remove:  delete agent from verifier and registrar (or 'all')

Reads server address from attestation-server.json, client certs
from keys/keylime/.  Available as 'nix run .#attestation-ctl'
and in the dev shell.
systemd services (systemd-tpm2-setup, systemd-pcrphase) extend PCRs
from userspace via the TSS2 library.  These extensions are not in the
UEFI event log but are recorded in /run/log/systemd/tpm2-measure.log.

Add parse_userspace_log() to the measured boot library and include
those events in replay_pcrs() so that PCR 9 and 11 replays match
the actual TPM state.
tpm2_eventlog 5.7 warns about EV_IPL events in PCR 11 because its
verify_digests() has no case for PCR 11 (only 8, 9, 12, 14).  Upstream
master (b25c9220) adds the missing case.  No release since 5.7, so we
build from git with the bootstrap step inlined.
Define all custom packages once in packages/default.nix and thread
them through _module.args.customPackages instead of repeating
callPackage in every module and test.  Ensures a single tpm2-tools
in the system closure without using a nixpkgs overlay.
Show PASS/FAIL/TIMEOUT/PENDING based on consecutive_attestation_failures,
last_successful_attestation, and attestation_count directly.  The verifier's
operational_state can be stale after agent reboots due to ORM commit issues.

Also add LAST OK and attestation count columns to the status table.
Add 'save' subcommand that snapshots the current refstate to
/var/lib/keylime/saved-refstate.json (the persistent encrypted
partition).  'diagnose' now auto-detects a saved refstate when
run without --refstate.

Remove the 'diff' subcommand — its functionality is now available
via 'diagnose old.json new.json' with two positional arguments.
diagnose already handled the offline case (no TPM sysfs) gracefully.
Show the save-then-diagnose workflow, document auto-detection of
saved refstates, and replace the removed 'diff' subcommand with
'diagnose old.json new.json'.
Use db_manager.session() directly instead of session_context()
in the uefi_ref_state property. session_context() calls commit()
on exit, which flushes pending EvidenceItemMapping objects on the
shared scoped session, causing SQLAlchemy identity map conflicts.
The verifier maps both `verifiermain` and `mbpolicies` via two
independent SQLAlchemy ORM classes:

  - keylime.db.verifier_db (declarative_base, own MetaData)
    used by push_agent_monitor and cloud_verifier_tornado
  - keylime.models.verifier (model framework, own registry)
    used by tpm_engine and the push-mode attestation flow

Each has its own identity map and change tracking; SQLAlchemy
has no way to synchronize them.  Writes through one mapping are
invisible to the other's cached instances, breaking both reads
and writes that cross the boundary.

Replace the previous narrowly-scoped 0003 with two patches that
issue raw SELECT/UPDATE bypassing both ORM layers:

  - 0003: read mb_policy via raw SELECT (fixes stale policy after
    tenant DELETE+CREATE re-enrollment).
  - 0004: write accept_attestations via raw UPDATE (fixes
    push-mode timeout recovery — previously the column was
    silently dropped from the ORM UPDATE because the model
    framework's loaded state still showed True after
    push_agent_monitor flipped it to False via the legacy mapping).

The proper upstream fix is to consolidate to a single mapping per
table; until then this is the only mechanism that crosses the
mapping boundary.
Stop the agent long enough for push_agent_monitor to fire its
timeout (quote_interval x 5 = 10s with the test config),
verify the verifier marks the agent FAIL via attestation_status,
then restart the agent and verify it recovers to PASS within
the recovery window.  Asserts attestation_count > baseline so a
stale PASS reading cannot satisfy the check.

This guards against the dual-mapping write bug fixed by the
keylime 0004 patch (push-mode self-healing was silently broken
upstream — count grew but accept_attestations stayed False
forever).  Without the fix the test fails at the recovery
assertion; with the fix it passes in ~17 seconds.
The existing reboot loop only checked that the verifier still had
an enrollment record for agent_uuid, which is a weak signal — the
record persists regardless of what the agent does after reboot.

Query the registrar after each reboot and assert that
results.uuids equals exactly [agent_uuid].  If the swtpm's EPS
ever regenerates (state directory lost, TPM2_Clear, etc.) the
agent would re-register under a new EK-derived UUID and the
registrar would contain two entries, failing the assertion.
The upstream PR branches for the elparsing and tpm-relevant-pcrs fixes
have been rebased onto master commits newer than v7.14.1 and have grown
test coverage since we first packaged them.  Pin keylime to master
commit 4c2a0c6ca84c ("Switch from CA organization of MITLL to Keylime",
30 commits past v7.14.1) — the common base of all current branches —
and regenerate all four local patches against it.

The regenerated patches now include the test additions upstream grew
in the meantime (test/test_mba_parsing.py, test/test_tpm_check_pcrs.py,
test/test_tpm_engine.py) and the expanded scope of the tpm-relevant-pcrs
fix that now threads mb_policy_name through cloud_verifier_common,
cloud_verifier_tornado, da/attest and verification/tpm_engine.

Both VM tests (keylime and keylime-auto-enroll) still pass on the new
base, including the push-mode timeout recovery subtest.
fix/uefi-log-privileged-fd was rebased upstream and the resulting
commit now touches 6 files (adding plumbing through attestation.rs,
main.rs, state_machine.rs) instead of the 3 in our original snapshot.
Regenerate against commit 0d63e3b from the current branch tip.

The fetched source tag (v0.2.9) is unchanged; only the patch itself
expands in scope.
Link each local patch to its upstream PR (or tracking issue) so the
provenance of each is obvious from the package files alone:

  - 0001 elparsing        → keylime/keylime#1878
  - 0002 tpm-relevant-pcrs → keylime/keylime#1879
  - 0003/0004 dual-mapping → keylime/keylime#1880 (issue, no PR yet)
  - keylime-agent 0001    → keylime/rust-keylime#1223
systemd 259's new NvPCR support adds a runtime PCR 9 extension in
systemd-tpm2-setup.service (an anchoring measurement for the bundled
hardware.nvpcr and cryptsetup.nvpcr definitions) that is not captured
in the UEFI event log.  This makes the event-log-vs-live-PCR replay
check fail on every boot and breaks measured boot attestation.

Exclude PCR 9 from relevant_pcr_indices, the set used by keylime's
mb_pcrs_to_check() to decide which PCRs to replay-check.  The event
level policy checks still run against whatever PCR 9 events are in
the UEFI event log, and the security critical content of PCR 9 (the
UKI image) is already pinned via uki_digest in PCR 4.

This mirrors how we already handle PCR 11, which is runtime extended
by systemd-pcrphase with boot phase strings.  Both PCRs share the
same problem (userspace extensions invisible to the UEFI event log)
and now share the same fix.

Alternative fixes considered and rejected:

  A. Masking the systemd shipped .nvpcr files via /etc/nvpcr/ dead
     symlinks.  Works, but brittle: any future .nvpcr file added by
     upstream systemd silently re-breaks attestation.
  B. Capturing runtime extensions in the refstate via userspace_digests
     and patching the verifier to account for them during replay.
     Most principled, but the NvPCR anchor measurement is per-host
     (derived from a local secret), which would defeat image-wide
     attestation and force per-host refstates.
  C. Pinning the PCR 9 value via tpm_policy.  Same per-host problem.
  D. Disabling systemd-tpm2-setup.service entirely.  Would also break
     TPM2 bound LUKS unlock, which we rely on for /var/lib/keylime
     and /var/lib/credentials.
systemd 259 changed how tpm2_get_best_pcr_bank() selects the PCR hash
algorithm: it now reads the LoaderTpm2ActivePcrBanks EFI variable
(written by the UEFI boot manager via GetActivePcrBanks()). Without TPM2
support compiled into OVMF, GetActivePcrBanks() returns 0, causing
systemd to log 'Firmware reports neither SHA1 nor SHA256 PCR banks,
cannot operate.' and fail every TPM2 unseal with EOPNOTSUPP.

Fix the installer VM test by switching from the default OVMF (no TPM
support) to OVMF.override { tpmSupport = true; }. This enables the
-D TPM2_ENABLE OVMF build flag so GetActivePcrBanks() correctly reports
the active PCR banks to userspace. The integration VM test already used
OVMFFull which already has tpmSupport = true.

With OVMF TPM support enabled, systemd-tpm2-setup-early (gated on
ConditionSecurity=measured-uki, satisfied because the stub can now
extend PCRs) creates the ECC SRK at 0x81000001, and tpm2_get_best_pcr_bank()
succeeds \u2014 no manual SRK provisioning needed.
systemd 258 bound tpm2-encrypted repart partitions to PCR 7 by default.
systemd 259 changed the default to an empty policy (no PCR restrictions),
silently dropping the secure-boot binding and allowing any holder of the
TPM to unseal the partitions.

Make the policy explicit with TPM2PCRs=7, consistent with how individual
credentials inside the partition are encrypted (--tpm2-pcrs=7 in
credential-storage.nix).
mainly to test a newer kernel on flaky test hardware
phaer added 5 commits April 13, 2026 14:28
…ate flake check

Replace the separate libraryTests flake check with pytestCheckHook in the
package's checkPhase. This is more idiomatic for nixpkgs Python packages
and ensures tests run on every build, not just when explicitly checked.

- Enable doCheck (was false) and add pytestCheckHook to nativeCheckInputs
- Remove libraryTests from unit-tests.nix and tests/default.nix
- Update README to reflect the new testing approach
… form

Replace the four pairs of hardcoded UEFI GUID strings with a helper
that computes the mixed-endian form by byte-reversing the first three
fields of the standard GUID.  Each GUID is now defined once in the
standard UEFI form; the mixed-endian variant (as seen in some
tpm2_eventlog output) is derived automatically.

This eliminates the risk of copy-paste errors between the two forms
and makes it easier to add new GUIDs in the future.

No Python efivar binding exists in nixpkgs, so a lightweight helper
is preferable to adding a C library dependency.
Replace the runCommand that copies a single .py file with a
buildPythonPackage using pyproject.toml. This enables:

- Proper dependency management via setuptools
- Tests via pytestCheckHook (replaces the manual PYTHONPATH wiring
  in the policyTests flake check)
- Standard Python packaging conventions (pyproject.toml, etc.)

The policyPath output still provides a directory suitable for the
verifier's PYTHONPATH, now pointing at the package's site-packages.

Pass the custom keylime package through keylime-shared.nix so the
policy tests can import from keylime.mba.elchecking.
…tations

- user-guide.md: enrollment diagram said 'PCRs 0,1,2,3,7,11' but the
  daemon passes --mb_refstate with no explicit PCR list; the uki policy
  determines replay via get_relevant_pcrs() = {0,1,2,3,4,5,7}.
  PCR 11 is excluded from replay; PCRs 4 and 5 were missing. Replace
  with '--mb_refstate, uki policy' which is accurate and stable.

- keylime-auto-enroll.nix: same fix in the module header comment.

- docs.md: remove the 'measured boot & attestation' bullet from
  Limitations and Further Work — it described the current working
  implementation, not an outstanding gap. The feature is fully covered
  in the 'Remote Attestation (Keylime)' section.
@phaer phaer marked this pull request as ready for review April 13, 2026 12:57
phaer added 4 commits April 15, 2026 12:39
Add dispatcher entries for event types seen on real hardware that were
missing from the policy, causing attestation to fail with 'unexpected
(PCRIndex, EventType) combination':

- EV_POST_CODE in PCR 0 and PCR 2: older TCG type used by some firmware
  for POST code and option ROM measurements. Both PCRs are in
  relevant_pcr_indices so the quote comparison covers integrity.
- EV_EFI_PLATFORM_FIRMWARE_BLOB{,2} in PCR 2: some firmware measures
  UEFI drivers here rather than PCR 0. Routed to the same
  platform_firmware_blobs collector as PCR 0 events, consistent with
  measure-boot-state which collects from all PCRs.
- EV_EFI_VARIABLE_BOOT2 in PCR 1: newer UEFI spec variant of
  EV_EFI_VARIABLE_BOOT; treated the same as existing PCR 1 handlers.
- EV_EFI_ACTION in PCR 6: PCR 6 is not in relevant_pcr_indices and is
  absent from tpm_policy, so the handler prevents the dispatcher from
  rejecting events without providing an end-to-end integrity guarantee.
- EV_SEPARATOR extended from range(8) to range(16): some firmware emits
  separators for PCRs beyond 7 to mark the end of each measurement phase.
systemd puts the console in UTF-8 mode at boot; the kernel then maps
Unicode code points through the font's Unicode table.  The default
VGA ROM font has no entries for U+2500+, so those glyphs render as '?'.

ter-v16n (Terminus) includes the full box-drawing range and is a
standard VGA-compatible bitmap font suitable for the Linux console.
@phaer phaer merged commit a3c0079 into main Apr 15, 2026
2 checks passed
@phaer phaer deleted the mb-policy branch April 15, 2026 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant