Agent Activity Graph

If an AI agent approves an invoice, most systems can tell you that the workflow completed.

Far fewer can tell you the controlled runtime path that produced that outcome:

what the agent did
when it did it
under whose authority
in which workflow step
against which system
what policy allowed, blocked, or escalated
where a human stepped back in
how to replay the sequence later

Agent Activity Graph exists to make that gap impossible to miss.

Agent Activity Graph is a compact, local-first reference product for runtime evidence and human review in policy-gated agent workflows.

Tracing tells you what ran. Review-ready evidence tells you what a human can approve.

It argues that orchestration alone is incomplete. Once agents touch real business processes, enterprises need a runtime truth layer, not just prompts, tool calls, or dashboards.

The wedge is narrow on purpose:

Agent Activity Graph is the review system of record for blocked or escalated agent actions.

This repository proves that the missing layer can be demonstrated with a small, local-first, technically credible system:

one workflow scenario
one canonical event model
one policy engine
one replay surface
one exportable evidence pack

It is not a chatbot. It is not a workflow builder. It is not process mining. It is not an agent framework.

Why Current Agentic Systems Are Incomplete

Most agentic systems concentrate on four things:

prompt quality
tool use
orchestration
outcome reporting

Those layers matter. They are still not enough.

The moment an agent participates in finance, procurement, operations, or support work, the hard question changes from "can the agent act?" to "can we explain the controlled runtime path of that action later?"

Without a runtime evidence layer:

policy disappears into application code or prompt text
incident review becomes log archaeology
process teams cannot see where authority changed hands
engineering cannot replay the workflow in business terms
governance sees outcomes, but not the exact controlled path that produced them

Runtime evidence is the missing layer between orchestration and trustworthy operational use.

What This Project Proves

Agent Activity Graph proves five things:

Policy can be preserved as runtime evidence, not hidden side-effect.
Replay is a stronger debugging and governance surface than dashboards alone.
Human intervention needs to appear in the same trace as agent activity.
The real product moment is the human review case created by a blocked or escalated action.
A local-first reference implementation is enough to make the category legible.

Ultra-Clear Quickstart

Requirements:

Python 3.11+

Fastest path from a fresh clone:

make demo
make run

make demo creates .venv, installs dependencies, and seeds deterministic demo data.

Equivalent step-by-step path:

make install
make seed
make run

Then open these pages in order:

The public walkthrough is deterministic. Running make seed refreshes the same workflow IDs and incident IDs used below so the demo stays stable.

Run tests:

make test

Run the sample trace proof path:

make demo
make run
make ingest-trace

By default, make ingest-trace imports examples/traces/openai_agents_invoice_review.json into the running app.

That fixture is public-source-derived rather than private demo data. It is adapted from current official OpenAI Agents tracing and approval patterns plus the MCP tools approval model.

Run a real local approval-agent export for free:

make generate-local-trace
make ingest-local-trace

This path uses a locally running Ollama model to execute the approval-review loop and writes examples/traces/ollama_local_invoice_review.json. It is a real runtime-generated export, not a hand-authored fixture.

Grade a trace or workflow against the review-readiness spec:

make review-readiness
make review-readiness WORKFLOW_ID=wf-invoice-3001
make benchmark-review-readiness

That writes the current benchmark table to docs/review-readiness-benchmark.md.

Regenerate the preview assets used in this README:

make assets

On the first run, make assets installs Chromium for Playwright and captures the current live UI from a temporary seeded local server.

Visual Preview

These preview assets are captured from the live seeded UI, not hand-composed mockups:

Home

Workflow Detail

Replay

Incident

Evidence Pack

2-Minute Walkthrough

If you are evaluating the project quickly, click these pages in this order.

/ Learn the thesis immediately: this is not another trace viewer. It is a runtime evidence layer for policy-gated human review.
/reviews Start with the review queue. This is the operating surface for policy-gated agent actions, not just a list of workflows.
/reviews/review-wf-invoice-3001 Open the blocked invoice review case. This page ties together authority owner, review owner, due-by, incident linkage, and review-readiness score.
/workflows/wf-invoice-3001/replay Move to the signature surface. The replay shows the agent action, policy stop, blocked path, human intervention, and final outcome as one evidence sequence.
/incidents/inc_evt_wf_3001_05 Open the incident generated by that same run. This page centers the trigger event, business consequence, findings, and nearby evidence window.
/incidents/inc_evt_wf_3001_05/evidence-pack Finish with the exportable artifact. The evidence pack turns the same stored chronology into a review document for engineering, governance, or process owners.

Optional supporting page:

/workflows/wf-invoice-3001 Use the workflow detail page when you want the broader run context around the same review case.

A strong evaluator should come away with one conclusion:

yes, orchestration alone is missing a runtime evidence layer.

More specifically:

the missing surface is the review loop around policy-gated agent actions.

Review Readiness Spec

Agent Activity Graph now includes a versioned review-readiness spec:

authority completeness
policy completeness
human review completeness
provenance completeness
evidence integrity

This is the named proof artifact in the repo. It lets the project say something stronger than "we captured a trace":

we can tell you whether that trace is fit for human review.

See:

5-Minute Proof Path

If you want to prove this is more than a seeded story:

Start the app with make demo and make run.
In another terminal, run make ingest-trace.
Open the printed replay and incident URLs.
Confirm the trace-derived run now shows:
- source trace reference
- explicit policy gate
- review case
- human decision
- evidence status

This is the shortest path from “reference product” to “review layer sitting above a public-source-derived trace shaped by current agent runtime patterns.”

The benchmark path uses the same narrow bridge:

one OpenInference-style ingestion contract
explicit OpenAI Agents and MCP semantics
one readiness grade that tells you if the result is actually fit for human review

Public Trace Basis

The default trace proof path is grounded in official public sources, not invented SDK terminology:

OpenAI Agents tracing docs: openai.github.io/openai-agents-python/tracing
OpenAI Agents human-in-the-loop example: github.com/openai/openai-agents-python/.../human_in_the_loop.py
OpenAI Agents hosted MCP approval example: github.com/openai/openai-agents-python/.../on_approval.py
MCP tools spec: modelcontextprotocol.io/specification/2025-06-18/server/tools

The fixture itself is documented in examples/traces/README.md.

If you have Ollama running locally, the repo also supports a no-cost proof path that generates a fresh runtime export from an actual local approval agent.

Demo Scenario

The project uses exactly one scenario:

invoice approval workflow with agent participation

Three seeded runs make the argument:

Normal path
- AP inbox receives a low-value invoice with a valid PO
- agent classifies and validates it
- policy gate allows autonomous continuation
- agent approves within delegated authority
- ERP schedules payment
Escalated path
- AP inbox receives a same-day, high-value consulting invoice
- agent prepares the approval case
- policy gate escalates because the amount exceeds delegated authority
- finance director approves
- ERP schedules payment before cut-off
Blocked path
- AP inbox receives an exception invoice with no PO
- agent prepares a recommendation anyway
- policy gate blocks the path
- agent attempts a disallowed approval action
- AP manager rejects and places the invoice on manual hold

The scenario carries real business consequence:

missing the payment cut-off delays settlement
crossing the approval boundary creates policy breach risk
unsafe paths become incident records and exportable evidence packs

Signature Feature: Replay

Replay is the center of the product.

It is designed to make one workflow instance legible in seconds:

the first meaningful agent action
the policy gate and its verdict
any blocked or escalated branch
the human handoff
the final business outcome
the consequence of delay or breach

The goal is not visual flourish. The goal is to reconstruct the workflow in a way engineering, governance, and process teams can all use.

Why This Is Not Another Trace Viewer

Generic trace viewers already exist and are improving quickly.

Agent Activity Graph is intentionally downstream of that layer.

Its job is to answer a different question:

when an agent action is blocked or escalated, what is the business review record?

That is why the core objects are not spans, dashboards, or eval charts. The core objects are:

policy decision
review case
incident
replay
exportable evidence pack

Exportable Evidence Pack

For every incident, the app can generate a local-first evidence pack with:

executive summary
business consequence
findings
chronology
recommended actions

It is intentionally static and printable. No SaaS flow, no external dependency, no fake platform wrapper.

What Counts As Review-Ready Evidence

This repository is intentionally strict about the boundary between a useful trace and review-ready evidence.

The strongest runs include:

source trace reference
explicit authority subject
authority delegation source
policy rule IDs
review case ID
human decision reason when a human resolves the case
chained evidence hashes across the stored event stream

If a trace is missing some of those fields, Agent Activity Graph can still ingest it, replay it, and expose it for review. But the run is marked as needs_enrichment instead of pretending the record is already complete.

That is deliberate. The product should under-claim, not over-claim.

Why One Workflow Is Still Enough

The repository still stays inside one invoice approval workflow.

That is not because the pattern only applies to AP. It is because the product question is broader than the scenario:

what happens when an agent action touches a controlled write path
policy intervenes
a human has to review the case
the final record has to be replayable and exportable later

That same review pattern can sit above procurement exceptions, support write actions, operations changes, or security approvals. The repo does not add those scenarios yet because the current job is to prove the control loop, not to broaden the demo surface.

Architecture

workflow event
    |
    v
FastAPI ingestion
    |
    v
Pydantic validation
    |
    v
policy evaluation
    |
    v
SQLite evidence store
    |
    +--> trace-to-workflow adapter (OpenInference / OpenAI Agents / MCP)
    |
    +--> workflow summary
    +--> incident record
    +--> activity graph
    +--> replay timeline
    +--> evidence pack

The important design decision is that every surface is derived from the same stored evidence stream.

The current repository also includes one narrow interoperability contract:

POST /api/traces/openinference

There is also a local helper for it:

.venv/bin/python scripts/ingest_trace.py --file examples/traces/openai_agents_invoice_review.json

It maps OpenInference-style spans, including OpenAI Agents traces and MCP tool calls, into the same canonical workflow evidence model. The trace alone is not treated as sufficient. Authority, policy, and review annotations still have to become explicit before the run is accepted as business evidence.

Repository Shape

agent-activity-graph/
  README.md
  pyproject.toml
  .env.example
  Makefile
  docs/
    assets/
  src/agent_activity_graph/
  tests/
  scripts/

Key modules:

src/agent_activity_graph/sdk/events.py
src/agent_activity_graph/policy/evaluator.py
src/agent_activity_graph/db/repository.py
src/agent_activity_graph/replay/timeline.py
src/agent_activity_graph/replay/evidence_pack.py
src/agent_activity_graph/demo/scenarios.py
scripts/generate_assets.py

Useful Endpoints

POST /api/events
POST /api/traces/openinference
GET /api/workflows
GET /api/workflows/{workflow_id}
GET /api/workflows/{workflow_id}/graph
GET /api/workflows/{workflow_id}/replay
GET /api/incidents
GET /api/incidents/{incident_id}
GET /api/incidents/{incident_id}/evidence-pack

Public Summary

Short release-ready language lives in docs/release-summary.md.

Documentation

What Would Strengthen This Further

The repo no longer depends on a purely hand-authored trace fixture. It now includes a public-source-derived proof path grounded in current OpenAI Agents and MCP patterns, plus a free local runtime-generated export path via Ollama.

The next proof step is still a real export from an approval-like agent run.

Best case:

a JSON export already shaped like TraceIngestionRequest

Still workable:

raw span JSON plus:
- workflow step names
- business object ID
- policy decision point
- final human review decision
- delegated authority text

That is not required to understand the repo anymore. It is just the next bar for proving the wedge against a live enterprise stack.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Activity Graph

Why Current Agentic Systems Are Incomplete

What This Project Proves

Ultra-Clear Quickstart

Visual Preview

Home

Workflow Detail

Replay

Incident

Evidence Pack

2-Minute Walkthrough

Review Readiness Spec

5-Minute Proof Path

Public Trace Basis

Demo Scenario

Signature Feature: Replay

Why This Is Not Another Trace Viewer

Exportable Evidence Pack

What Counts As Review-Ready Evidence

Why One Workflow Is Still Enough

Architecture

Repository Shape

Useful Endpoints

Public Summary

Documentation

What Would Strengthen This Further

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
docs		docs
examples/traces		examples/traces
scripts		scripts
src/agent_activity_graph		src/agent_activity_graph
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Agent Activity Graph

Why Current Agentic Systems Are Incomplete

What This Project Proves

Ultra-Clear Quickstart

Visual Preview

Home

Workflow Detail

Replay

Incident

Evidence Pack

2-Minute Walkthrough

Review Readiness Spec

5-Minute Proof Path

Public Trace Basis

Demo Scenario

Signature Feature: Replay

Why This Is Not Another Trace Viewer

Exportable Evidence Pack

What Counts As Review-Ready Evidence

Why One Workflow Is Still Enough

Architecture

Repository Shape

Useful Endpoints

Public Summary

Documentation

What Would Strengthen This Further

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages