Skip to content

analyzer: report alternative transitive call paths in JSON output#297

Open
battuto wants to merge 6 commits intogoogle:mainfrom
battuto:fix-alternative-transitive-paths
Open

analyzer: report alternative transitive call paths in JSON output#297
battuto wants to merge 6 commits intogoogle:mainfrom
battuto:fix-alternative-transitive-paths

Conversation

@battuto
Copy link
Copy Markdown

@battuto battuto commented Mar 12, 2026

Problem

When using -output json (the default, most-readable output mode), capslock reports only one call path per (function, capability) pair. This is because forEachPath runs a backward BFS from capability nodes and records a single bfsState per visited node. The first path discovered wins; any alternative transitive path through a different intermediate function is silently dropped.

Real-world impact (issue #153):
When analyzing gogs.io/gogs/internal/ssh, the hyperlink points to the affected revision because that import was removed later. However, the issue is still evident in internal/ssh: in commit 3650b32, handleServerConn handles both env and exec requests, invoking com.ExecCmd("env", ...) in the env branch and exec.Command(conf.AppPath(), args...) in the exec branch.

  • exec.Command directly (line 85) — reported correctly as CAPABILITY_EXEC
  • com.ExecCmd (line 73) → ExecCmdDirExecCmdDirBytesexec.Commandnot reported

The VTA call graph contains all the edges; the issue is purely in the BFS result collection.

Why -granularity=intermediate does not solve this

As suggested in #153, using -granularity=intermediate does surface the unknwon/com package, but it is not an equivalent substitute:

  1. It uses a different code path (CapabilityGraphsearchForwardsFromQueriedFunctions), not forEachPath.
  2. Its output is per-package, not per-function — it tells you which packages are involved but not which function in your code triggers the transitive capability.
  3. -output json is the primary output mode users rely on for readable, function-level auditing. Users should not have to fall back to a coarser granularity to discover all capability paths.

Why this matters for supply chain security and malware analysis

Reporting all paths to a capability per function is critical for detecting supply chain attacks and malicious code injection:

  1. Detecting injected capability paths. A supply chain attack often injects malicious code into a dependency, adding a new call path to a sensitive capability (exec, network, file I/O) through an intermediate library. Without this fix, if a function already has a direct call to exec.Command, the BFS finds that path first and stops. An injected transitive path through a compromised dependency (e.g., evilpkg.Helper → exec.Command) is invisible — exactly what an attacker wants. The auditor sees CAPABILITY_EXEC and assumes it is the known direct call, unaware that a compromised dependency also gained exec access.

  2. Different flows cross different trust boundaries. A direct call to exec.Command may be fully controlled by the audited package (sanitized arguments, validated input). A transitive call through a third-party dependency passes control to external code with its own attack surface. By reporting both paths, an analyst can ask: Which dependencies have exec access? Is this new dependency in the call chain expected? Did a dependency update introduce a new path to a sensitive capability?

  3. Diff-based supply chain monitoring. Capslock's capslock-git-diff tool compares capability reports between commits/versions. If an attacker injects a new transitive path to a capability, but the function already had a direct path to that same capability, the old behavior would show no diff — the (function, capability) pair already existed. With this fix, the new transitive path through the compromised dependency appears as a new entry in the diff, raising a flag.

  4. Real-world example. In the gogs case, if github.com/unknwon/com were compromised, the ExecCmd call at line 73 could execute arbitrary commands with attacker-controlled arguments — but capslock's JSON output never mentioned com.ExecCmd at all. An auditor relying on capslock would have no visibility into this attack surface.

In short: reporting all paths to a capability per function turns capslock from a "does this function have capability X?" tool into a "through which dependencies does this function reach capability X?" tool — which is the question that actually matters for supply chain security.

Solution

Added a second pass in forEachPath after the existing BFS completes. For each function in the queried packages that was reached by the BFS:

  1. Examine all outgoing call edges.
  2. Find edges whose callee is also in the visited set (meaning it can reach the capability) but was not the callee recorded by the BFS.
  3. Temporarily swap the BFS state to report each alternative path via fn(), then restore the original state.

This preserves full backward compatibility — the original BFS path is still reported first — while additionally surfacing every alternative transitive route.

Before (-output json, excerpt)

{
  "capability": "CAPABILITY_EXEC",
  "depPath": "(gogs.io/gogs/internal/ssh.handleServerConn$1) exec.Command",
  "capabilityType": "CAPABILITY_TYPE_DIRECT"
}

Only the direct exec.Command call is shown; the com.ExecCmd transitive path is missing.

After

{
  "capability": "CAPABILITY_EXEC",
  "depPath": "(gogs.io/gogs/internal/ssh.handleServerConn$1) exec.Command",
  "capabilityType": "CAPABILITY_TYPE_DIRECT"
},
{
  "capability": "CAPABILITY_EXEC",
  "depPath": "(gogs.io/gogs/internal/ssh.handleServerConn$1) (github.com/unknwon/com.ExecCmd) ...",
  "capabilityType": "CAPABILITY_TYPE_TRANSITIVE"
}

Both the direct and transitive paths are now reported.

Fixes #153.

When a function in the queried package has multiple outgoing call edges
that each reach the same capability through different intermediate
functions, only one path was reported by the BFS. This caused the JSON
output to miss transitive capabilities reachable through alternative
call paths.

Add a second pass after the BFS that iterates over queried-package
functions and checks for additional outgoing edges leading to visited
nodes. For each such alternative edge, temporarily update the BFS state
and report the path, then restore the original state.

Fixes google#153
@google-cla
Copy link
Copy Markdown

google-cla Bot commented Mar 12, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@jcd2
Copy link
Copy Markdown
Collaborator

jcd2 commented Apr 29, 2026

This is good, but it still wouldn't report every path from a particular function to a capability, because there could be some paths that diverge only after one or more calls.

The -output=graph mode implicitly includes every path (by outputting all the edges that are part of any path) although it has package-level granularity at the moment. Maybe a mode that gives you the graph starting from a single function, or a separate graph for each function, would be useful?

@battuto
Copy link
Copy Markdown
Author

battuto commented Apr 30, 2026

Thanks, you were right. The previous second pass only handled paths that diverged immediately from the queried function, so it still missed cases like:

A -> B -> C -> capability
A -> B -> D -> capability

I updated the implementation to enumerate simple paths from each queried function through the subgraph of nodes that can reach the capability. For each path, it builds a temporary bfsStateMap representing that exact path and emits the existing CapabilityInfo shape, so the JSON schema remains unchanged; the output just contains multiple entries with different depPath values.

I also kept the existing one-entry-per-function behavior for counts / omitted paths, so machine/count-style output does not accidentally become path-count based.

Added a regression test for the deeper divergence case:
A -> B -> dep.C
A -> B -> dep.D

Verified with:
go test ./analyzer

I also tried go test ./..., but in my local Windows 386 environment it fails before reaching this change because the cgo fixture is excluded and the assembly fixture does not build. The analyzer package tests pass.

@battuto
Copy link
Copy Markdown
Author

battuto commented Apr 30, 2026

I thought more about the graph-based approach you suggested and implemented it as a separate graph-json output mode, without changing the existing CapabilityInfo JSON schema.

The new mode emits a structured capability graph with function/capability nodes and call/capability edges, so consumers can reconstruct all paths from the graph instead of relying only on flattened depPath strings.

New options:

  • -output=graph-json
  • -graph_function=<exact function name> to emit the graph reachable from a single queried function
  • -graph_per_function to emit one graph per reachable queried function

Example shape:

{
  "graphs": [
    {
      "root": "github.com/example/app.Handle",
      "capabilities": ["EXEC"],
      "nodes": [
        {"id": "github.com/example/app.Handle", "kind": "function", "package": "github.com/example/app"},
        {"id": "os/exec.Command", "kind": "function", "package": "os/exec"},
        {"id": "CAPABILITY_EXEC", "kind": "capability"}
      ],
      "edges": [
        {"from": "github.com/example/app.Handle", "to": "os/exec.Command", "kind": "call"},
        {"from": "os/exec.Command", "to": "CAPABILITY_EXEC", "kind": "capability"}
      ]
    }
  ]
}

I also added a regression test for the deeper divergence case:

A -> B -> dep.C -> FILES
A -> B -> dep.D -> FILES

Verified locally with:

go test ./analyzer
go test ./cmd/... ./interesting

So the existing JSON output still reports multiple depPath entries, while tools that need complete path reconstruction can consume the new graph JSON output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CAPABILITY_EXEC is not detected for unknwon/com package

2 participants