feat: support agent governance by kylewanginchina · Pull Request #11446 · deepflowio/deepflow

kylewanginchina · 2026-03-09T14:09:42Z

This PR is for:

Agent

Support agent governance

Checklist

Added unit test.

Backport to branches

lzf575

ingest 部分没问题

xiaochaoren1 · 2026-03-12T09:13:11Z

translation.go 部分，只在 INT_ENUM_PEER_TAG 里加一下即可，其他地方不用改

Add inputs.proc.ai_agent config section with http_endpoints (default: /v1/chat/completions, /v1/embeddings), max_payload_size (default: 1MB), and file_io_enabled. Forward to LogParserConfig. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add BIZ_TYPE_DEFAULT (0) and BIZ_TYPE_AI_AGENT (1) constants for process classification in AI agent governance. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Stub module for AI Agent governance. Returns no-ops in open source. Real implementation provided by enterprise enterprise-utils crate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Enterprise-gated hook calls enterprise_utils::ai_agent::match_ai_agent_endpoint to detect LLM API URLs. Sets endpoint and biz_type=AI_AGENT on match. Priority: WASM/biz_field > AI Agent detection > http_endpoint config. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

AI Agent processes will be synced to controller with biz_type=1 (AI_AGENT). Field plumbing only — registry integration in a later task. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

AI Agent flows use ai_agent_max_payload_size (1MB default) instead of l7_log_packet_size to preserve full LLM request/response bodies for governance audit. Changes: - Add is_ai_agent flag to FlowLog (enterprise-gated) to track flows identified as AI Agent traffic via biz_type detection - In l7_parse_log, use ai_agent_max_payload_size for payload truncation when the flow is marked as AI Agent - After parse_payload returns, check parsed result for BIZ_TYPE_AI_AGENT and set the flag for subsequent packets in the flow - Add L7ParseResult::has_biz_type() helper to check parsed results - Saturate ParseParam::buf_size to u16::MAX to avoid overflow with larger AI Agent payload sizes Enterprise feature only. Original behavior preserved for non-AI-Agent flows and non-enterprise builds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add access_permission (__u16) to __io_event_buffer struct for exposing file permission bits (inode->i_mode & 0xFFF) in I/O events. Add #ifdef EXTENDED_AI_AGENT_FILE_IO hook in trace_io_event_common() that allows enterprise extensions to bypass the latency filter for AI agent processes and populate access_permission from the inode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add global registry accessors (init_global_registry, global_registry) to enterprise-utils ai_agent module (stub returns None in open source) - Initialize registry at startup in trident.rs (enterprise only) - Register AI Agent PIDs in perf/mod.rs when biz_type detection fires - proc_scan_hook checks registry to set biz_type=AI_AGENT on ProcessData Enterprise feature only. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… var - Import L7ProtocolInfoInterface trait for get_biz_type() in l7_protocol_log.rs - Prefix process_datas with underscore in proc_scan_hook.rs to suppress unused variable warning in non-enterprise builds Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

In C, a label must be followed by a statement, not a declaration. The struct declaration after skip_latency_filter: causes a compile error when EXTENDED_AI_AGENT_FILE_IO is defined. Add a null statement (;) to satisfy the grammar requirement. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kylewanginchina · 2026-03-16T02:40:32Z

translation.go 部分，只在 INT_ENUM_PEER_TAG 里加一下即可，其他地方不用改

@xiaochaoren1 意思是下图中红框的部分不需要，只需要在 INT_ENUM_PEER_TAG中加一下biz_type？

- proc_scan_hook: inject AI agent PIDs not matched by process_matcher so they appear in MySQL process table (not just l7_flow_log) - handler.rs: add /v1/responses to default ai_agent_endpoints - perf/mod.rs: remove redundant register() with empty endpoint - http.rs: borrow path instead of cloning on every HTTP parse - socket.c: change __set_ai_agent_data_limit_max param to unsigned int to fix dead code branch (limit_size > INT_MAX unreachable with int) - server: decode access_permission from IoEventData into ClickHouse file_event table (column constant, EventStore field, column block) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

移除开源版 AiAgentRegistry stub 中的 record_endpoint_hit() 方法，与企业版删除 endpoint 唯一性约束保持一致。 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

**复现步骤** AI Agent子进程（fork/exec产生的）的file_event和proc_lifecycle_event 的gprocess_id为0，因为子进程在controller同步到process表之前就已经产生了事件。 **原因和解决方案** 子进程的gprocess_id依赖server端通过QueryProcessInfo查询，但新 fork的子进程可能还未同步到process表。解决方案：在Agent端维护root_pid（最初通过endpoint识别的根AI Agent 进程PID），通过protobuf的ai_agent_root_pid字段传递到server端。 Server端在cache和直接PID查询都失败时，使用ai_agent_root_pid作为 fallback查询gprocess_id。变更内容： - metric.proto: ProcEvent新增ai_agent_root_pid字段(tag=14) - proc_event/linux.rs: ProcEvent结构体新增ai_agent_root_pid字段 - ebpf_dispatcher.rs: 新增fill_ai_agent_root_pid()从registry查询 root_pid并填充到事件中 - decoder.go: resolveGProcessID()新增ai_agent_root_pid fallback - enterprise-utils/lib.rs: 开源stub新增get_root_pid/register_child **影响范围** 仅影响AI Agent治理数据采集功能的gprocess_id解析 **验证方案** - 单元测试：TestResolveGProcessIDAiAgentRootPidFallback - 部署后验证fork事件的gprocess_id不再为0 **涉及分支** * support-agent-governance **检查项** - [x] 需要更新依赖 - [ ] 是共性问题(代码中存在类似问题) - [ ] 编译通过 - [ ] 单元测试通过 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Strip "FileOp" prefix from event_type (fileopcreate→create), split full file path into file_dir + file_name, and populate access_permission for chmod events. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

AI Agent进程的文件read/write事件之前仅绕过了latency过滤，但仍被io_event_collect_mode过滤（默认mode=1要求trace关联）。 fork的子进程exec后执行独立的文件操作没有trace_id，导致事件被丢弃。现在AI Agent进程同时绕过collect_mode和latency过滤。 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kylewanginchina force-pushed the support-agent-governance branch 5 times, most recently from 6ea8ccf to 8980494 Compare March 11, 2026 15:55

deepflowio deleted a comment from claude bot Mar 12, 2026

kylewanginchina marked this pull request as ready for review March 12, 2026 03:19

lzf575 previously approved these changes Mar 12, 2026

View reviewed changes

kylewanginchina dismissed lzf575’s stale review via 70110ab March 12, 2026 06:57

kylewanginchina force-pushed the support-agent-governance branch 2 times, most recently from 9d6be57 to 4b22b71 Compare March 12, 2026 08:09

kylewanginchina force-pushed the support-agent-governance branch from d74d5d1 to eed5377 Compare March 12, 2026 15:25

kylewanginchina and others added 10 commits March 13, 2026 00:05

feat(agent): add BIZ_TYPE_AI_AGENT constant

60bddd8

Add BIZ_TYPE_DEFAULT (0) and BIZ_TYPE_AI_AGENT (1) constants for process classification in AI agent governance. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(agent): add ai_agent stub in enterprise-utils

245b0ca

Stub module for AI Agent governance. Returns no-ops in open source. Real implementation provided by enterprise enterprise-utils crate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(agent): add biz_type to ProcessData and ProcessInfo proto

3330804

AI Agent processes will be synced to controller with biz_type=1 (AI_AGENT). Field plumbing only — registry integration in a later task. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kylewanginchina force-pushed the support-agent-governance branch 3 times, most recently from c161d36 to 931bed0 Compare March 14, 2026 08:35

kylewanginchina force-pushed the support-agent-governance branch from 931bed0 to 8307b66 Compare March 16, 2026 03:46

kylewanginchina and others added 24 commits March 16, 2026 12:20

agent: add ai-agent chmod/chown/unlink tracepoints

c0d3329

Fix AI Agent cleanup using full proc scan

2aea671

Fix AI agent pid_tgid usage in socket trace

321af1c

Reduce AI agent stack usage in data submit

ba6a9bb

Sync biz_type for gprocess in multi-controller

c6b50f2

Fix missing is_ai_agent in socket_trace

95fe692

Avoid BPF stack usage for ai_agent flag

5b78c97

Fix proc scan hook warning and HTTP endpoint borrow

ac18359

agent: auto sync AI agent gprocess_info

0ebe436

agent: mark ai agent biz_type in gprocess

b45fa06

server: support gprocess.biz_type tag query

aa92135

agent: log ai agent pids for gprocess sync

a63adf3

ai-agent: inherit child proc lifecycle

5df64cc

ai-agent: expose proc event start time

b1be1b4

Fix proc lifecycle gprocess fallback and captured bytes

3a4ef94

fix: guard ai reasm bytes on invalid socket info

a8c70bb

Fix proc.gprocess_info refresh on process change

6d8c0ce

fix: enable reassembly after protocol inference

aaf3a3b

fix: enable reassembly on inferred protocol for existing sockets

9bde4c6

fix: ai-agent 子孙进程继承 gprocess_id

1957515

fix: proc exec 保持 gprocess 继承与单测

c128384

Ensure AI agent pids included in socket list sync

96fa37f

fix: propagate reasm_bytes on ebpf merge

d0ab245

kylewanginchina force-pushed the support-agent-governance branch from 8307b66 to d0ab245 Compare March 16, 2026 04:21

kylewanginchina and others added 2 commits March 16, 2026 23:44

fix: remove record_endpoint_hit stub from AI Agent registry

932234d

移除开源版 AiAgentRegistry stub 中的 record_endpoint_hit() 方法，与企业版删除 endpoint 唯一性约束保持一致。 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

kylewanginchina force-pushed the support-agent-governance branch from 8e93928 to 94e4ea5 Compare March 17, 2026 11:13

kylewanginchina and others added 2 commits March 18, 2026 13:25

fix: normalize file_op event output to match IoEvent format

647d2d2

Strip "FileOp" prefix from event_type (fileopcreate→create), split full file path into file_dir + file_name, and populate access_permission for chmod events. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support agent governance#11446

feat: support agent governance#11446
kylewanginchina wants to merge 44 commits intomainfrom
support-agent-governance

kylewanginchina commented Mar 9, 2026

Uh oh!

lzf575 left a comment

Uh oh!

xiaochaoren1 commented Mar 12, 2026 •

edited

Loading

Uh oh!

kylewanginchina commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kylewanginchina commented Mar 9, 2026

This PR is for:

Support agent governance

Checklist

Backport to branches

Uh oh!

lzf575 left a comment

Choose a reason for hiding this comment

Uh oh!

xiaochaoren1 commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kylewanginchina commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xiaochaoren1 commented Mar 12, 2026 •

edited

Loading