analyzer: report alternative transitive call paths in JSON output#297
analyzer: report alternative transitive call paths in JSON output#297battuto wants to merge 6 commits intogoogle:mainfrom
Conversation
When a function in the queried package has multiple outgoing call edges that each reach the same capability through different intermediate functions, only one path was reported by the BFS. This caused the JSON output to miss transitive capabilities reachable through alternative call paths. Add a second pass after the BFS that iterates over queried-package functions and checks for additional outgoing edges leading to visited nodes. For each such alternative edge, temporarily update the BFS state and report the path, then restore the original state. Fixes google#153
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
|
This is good, but it still wouldn't report every path from a particular function to a capability, because there could be some paths that diverge only after one or more calls. The -output=graph mode implicitly includes every path (by outputting all the edges that are part of any path) although it has package-level granularity at the moment. Maybe a mode that gives you the graph starting from a single function, or a separate graph for each function, would be useful? |
|
Thanks, you were right. The previous second pass only handled paths that diverged immediately from the queried function, so it still missed cases like: A -> B -> C -> capability I updated the implementation to enumerate simple paths from each queried function through the subgraph of nodes that can reach the capability. For each path, it builds a temporary bfsStateMap representing that exact path and emits the existing CapabilityInfo shape, so the JSON schema remains unchanged; the output just contains multiple entries with different depPath values. I also kept the existing one-entry-per-function behavior for counts / omitted paths, so machine/count-style output does not accidentally become path-count based. Added a regression test for the deeper divergence case: Verified with: I also tried |
…' into fix-alternative-transitive-paths
|
I thought more about the graph-based approach you suggested and implemented it as a separate The new mode emits a structured capability graph with function/capability nodes and call/capability edges, so consumers can reconstruct all paths from the graph instead of relying only on flattened New options:
Example shape: {
"graphs": [
{
"root": "github.com/example/app.Handle",
"capabilities": ["EXEC"],
"nodes": [
{"id": "github.com/example/app.Handle", "kind": "function", "package": "github.com/example/app"},
{"id": "os/exec.Command", "kind": "function", "package": "os/exec"},
{"id": "CAPABILITY_EXEC", "kind": "capability"}
],
"edges": [
{"from": "github.com/example/app.Handle", "to": "os/exec.Command", "kind": "call"},
{"from": "os/exec.Command", "to": "CAPABILITY_EXEC", "kind": "capability"}
]
}
]
}I also added a regression test for the deeper divergence case: A -> B -> dep.C -> FILES Verified locally with: go test ./analyzer So the existing JSON output still reports multiple depPath entries, while tools that need complete path reconstruction can consume the new graph JSON output. |
Problem
When using
-output json(the default, most-readable output mode), capslock reports only one call path per (function, capability) pair. This is becauseforEachPathruns a backward BFS from capability nodes and records a singlebfsStateper visited node. The first path discovered wins; any alternative transitive path through a different intermediate function is silently dropped.Real-world impact (issue #153):
When analyzing gogs.io/gogs/internal/ssh, the hyperlink points to the affected revision because that import was removed later. However, the issue is still evident in internal/ssh: in commit 3650b32, handleServerConn handles both env and exec requests, invoking com.ExecCmd("env", ...) in the env branch and exec.Command(conf.AppPath(), args...) in the exec branch.
exec.Commanddirectly (line 85) — reported correctly asCAPABILITY_EXECcom.ExecCmd(line 73) →ExecCmdDir→ExecCmdDirBytes→exec.Command— not reportedThe VTA call graph contains all the edges; the issue is purely in the BFS result collection.
Why
-granularity=intermediatedoes not solve thisAs suggested in #153, using
-granularity=intermediatedoes surface theunknwon/compackage, but it is not an equivalent substitute:CapabilityGraph→searchForwardsFromQueriedFunctions), notforEachPath.-output jsonis the primary output mode users rely on for readable, function-level auditing. Users should not have to fall back to a coarser granularity to discover all capability paths.Why this matters for supply chain security and malware analysis
Reporting all paths to a capability per function is critical for detecting supply chain attacks and malicious code injection:
Detecting injected capability paths. A supply chain attack often injects malicious code into a dependency, adding a new call path to a sensitive capability (exec, network, file I/O) through an intermediate library. Without this fix, if a function already has a direct call to
exec.Command, the BFS finds that path first and stops. An injected transitive path through a compromised dependency (e.g.,evilpkg.Helper → exec.Command) is invisible — exactly what an attacker wants. The auditor seesCAPABILITY_EXECand assumes it is the known direct call, unaware that a compromised dependency also gained exec access.Different flows cross different trust boundaries. A direct call to
exec.Commandmay be fully controlled by the audited package (sanitized arguments, validated input). A transitive call through a third-party dependency passes control to external code with its own attack surface. By reporting both paths, an analyst can ask: Which dependencies have exec access? Is this new dependency in the call chain expected? Did a dependency update introduce a new path to a sensitive capability?Diff-based supply chain monitoring. Capslock's
capslock-git-difftool compares capability reports between commits/versions. If an attacker injects a new transitive path to a capability, but the function already had a direct path to that same capability, the old behavior would show no diff — the (function, capability) pair already existed. With this fix, the new transitive path through the compromised dependency appears as a new entry in the diff, raising a flag.Real-world example. In the gogs case, if
github.com/unknwon/comwere compromised, theExecCmdcall at line 73 could execute arbitrary commands with attacker-controlled arguments — but capslock's JSON output never mentionedcom.ExecCmdat all. An auditor relying on capslock would have no visibility into this attack surface.In short: reporting all paths to a capability per function turns capslock from a "does this function have capability X?" tool into a "through which dependencies does this function reach capability X?" tool — which is the question that actually matters for supply chain security.
Solution
Added a second pass in
forEachPathafter the existing BFS completes. For each function in the queried packages that was reached by the BFS:fn(), then restore the original state.This preserves full backward compatibility — the original BFS path is still reported first — while additionally surfacing every alternative transitive route.
Before (
-output json, excerpt){ "capability": "CAPABILITY_EXEC", "depPath": "(gogs.io/gogs/internal/ssh.handleServerConn$1) exec.Command", "capabilityType": "CAPABILITY_TYPE_DIRECT" }Only the direct
exec.Commandcall is shown; thecom.ExecCmdtransitive path is missing.After
{ "capability": "CAPABILITY_EXEC", "depPath": "(gogs.io/gogs/internal/ssh.handleServerConn$1) exec.Command", "capabilityType": "CAPABILITY_TYPE_DIRECT" }, { "capability": "CAPABILITY_EXEC", "depPath": "(gogs.io/gogs/internal/ssh.handleServerConn$1) (github.com/unknwon/com.ExecCmd) ...", "capabilityType": "CAPABILITY_TYPE_TRANSITIVE" }Both the direct and transitive paths are now reported.
Fixes #153.