spec: add §9 session-level aggregation extension (Closes #3) by nanookclaw · Pull Request #4 · willau95/atlast-ecp

nanookclaw · 2026-03-31T04:04:49Z

Summary

Implements the §9 Session-Level Aggregation spec section proposed in #3.

What's Added

§9 Session-Level Aggregation (ECP-SPEC.md)

A new optional session_summary record type that aggregates per-action behavioral flags into a session-level view:

{
  "ecp": "1.0",
  "id": "sess_<hex32>",
  "type": "session_summary",
  "agent": "did:ecp:...",
  "session_start": 1741766400000,
  "session_end": 1741769000000,
  "record_count": 47,
  "flag_totals": { "retried": 3, "incomplete": 1, "hedged": 12, "error": 0, ... },
  "delivery_score": 0.894,
  "calibration_flag_rate": 0.255,
  "prev_session": "sess_..."
}

Two derived metrics:

delivery_score = 1 - (retried + incomplete + error) / record_count — fraction of actions completed without failure
calibration_flag_rate = hedged / record_count — hedging frequency (explicitly marked as neutral/context-dependent)

Cross-session drift via prev_session chaining:
Walk the chain to compute slope of delivery_score across sessions. O(n) where n = session count, versus O(total records) for raw rescanning. Makes gradual behavioral regression visible even when no single session crosses a flag threshold.

Design Decisions

Strictly additive: zero changes to existing record format. Readers MUST NOT reject batches containing session_summary records.
All fields derivable from §3 flags: no new data sources, no LLM-as-Judge, no self-reporting — consistent with ECP's core design principle.
Minimal formula: delivery_score uses only the three clearly-negative flags (retried, incomplete, error). hedging is kept separate as a neutral signal to avoid encoding assumptions about what hedging rate is "good".
Server semantics: §9.4 specifies that servers MUST NOT use summary records to override individual record flags — summaries are derived, not authoritative.

What This Enables

An operator can answer "is this agent's delivery rate declining over 30 days?" by walking a 30-node chain of session summaries rather than rescanning potentially thousands of raw records.

Closes #3

Add optional session_summary record type enabling: - Per-session delivery_score: 1 - (retried + incomplete + error) / record_count - calibration_flag_rate: hedged / record_count (neutral context-dependent signal) - prev_session chaining for O(n) cross-session drift detection All derived metrics are computable from existing §3 behavioral flags with no new data sources required. session_summary is strictly additive — existing record format and validation rules are unchanged. Motivation: per-action flags can't surface gradual behavioral regression across hundreds of sessions. A linked chain of session summaries makes slope-based drift detection possible without rescanning raw records. Implementation notes in §9.4 cover server storage semantics and writer/reader obligations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec: add §9 session-level aggregation extension (Closes #3)#4

spec: add §9 session-level aggregation extension (Closes #3)#4
nanookclaw wants to merge 1 commit intowillau95:mainfrom
nanookclaw:feat/session-level-aggregation

nanookclaw commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nanookclaw commented Mar 31, 2026

Summary

What's Added

§9 Session-Level Aggregation (ECP-SPEC.md)

Design Decisions

What This Enables

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant