Skip to content

perf(assembly): optimize call graph toposort from O(V*E) to O(V+E)#2942

Open
giwaov wants to merge 2 commits into0xMiden:nextfrom
giwaov:perf/toposort-linear-time
Open

perf(assembly): optimize call graph toposort from O(V*E) to O(V+E)#2942
giwaov wants to merge 2 commits into0xMiden:nextfrom
giwaov:perf/toposort-linear-time

Conversation

@giwaov
Copy link
Copy Markdown
Contributor

@giwaov giwaov commented Mar 30, 2026

Closes #2830

Summary

The CallGraph::toposort() and toposort_caller() implementations used nested loops that resulted in O(V*E) complexity for large call graphs. This PR replaces them with proper O(V+E) Kahn's algorithm.

Problem

  • num_predecessors() scanned all nodes/edges to count predecessors for a single node, and was called inside the main Kahn's loop for every edge removal O(E) per call, O(V*E) total.
  • toposort() cloned the entire graph just to destructively remove edges during the sort.
  • Cycle detection used Vec::contains() (O(n) per check).
  • reverse_reachable() scanned all edges per BFS step O(V*E).

Fix

  • Pre-compute in-degree map at the start of each sort. Decrementing an in-degree counter is O(1) vs. O(E) for num_predecessors().
  • Eliminate graph clones the in-degree map tracks state instead of destructive edge removal.
  • Use BTreeSet for visited-node lookups in cycle detection O(log n) vs. O(n).
  • Build reverse adjacency map in reverse_reachable() single O(V+E) pass instead of O(V*E) scanning.

Testing

All 9 existing callgraph unit tests pass unchanged. Full miden-assembly test suite (180 tests) passes with 0 failures.

Copy link
Copy Markdown
Collaborator

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, though this will need a rebase.

.any(|(n, out_edges)| !output.contains(n) || !out_edges.is_empty());
if has_cycle {
// If not all nodes were visited, the remaining nodes participate in cycles
if output.len() != num_nodes {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a small regression test for the whole-graph cycle case with no initial roots? That was the main shape I wanted pinned down around this new cycle path.

#[test]
fn callgraph_toposort_whole_graph_cycle_without_roots() {
    let graph = callgraph_cycle_without_roots();
    let err = graph.toposort().expect_err(
        "expected topological sort to fail when every node is blocked behind a cycle",
    );
    assert_eq!(err.0.into_iter().collect::<Vec<_>>(), &[A1, A2, A3]);
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. added callgraph_toposort_whole_graph_cycle_without_roots test and rebased onto next.

@huitseeker huitseeker requested a review from bitwalker April 1, 2026 10:42
…xMiden#2830)

Replace the O(V*E) Kahn's algorithm implementation in CallGraph with a proper
O(V+E) version:

- Pre-compute in-degree map instead of rescanning all edges via
  num_predecessors() on each edge removal
- Eliminate graph clone in toposort() by tracking in-degrees rather than
  destructively removing edges
- Eliminate graph clone in toposort_caller() using the same technique
- Build reverse adjacency map in reverse_reachable() instead of scanning
  all edges per BFS step
- Use BTreeSet for cycle detection instead of Vec::contains()
@giwaov giwaov force-pushed the perf/toposort-linear-time branch from 9178273 to d77844b Compare April 1, 2026 12:18
Copy link
Copy Markdown
Collaborator

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Call graph topological sort scales superlinearly

2 participants