-
Notifications
You must be signed in to change notification settings - Fork 8.4k
Description
Summary
Propose a GuardrailProvider protocol that intercepts tool calls before execution, enabling policy-based approval, audit logging, and argument sanitization. This plugs into the existing BaseTool.run_json() and Workbench.call_tool() paths without breaking backward compatibility.
Motivation
AutoGen currently has no standardized hook point between an agent deciding to call a tool and the tool executing. The community has raised this gap from multiple angles:
- Guardrails and Safety #6017 -- Guardrails and Safety epic: comments call for "a scanning layer that inspects messages between agents" and auditing tool calls at agent boundaries.
- Support Approval Func in BaseTool in AgentChat #5891 -- Support Approval Func in BaseTool: proposes an
approval_funcparameter onBaseTool, with open design questions about whether approval belongs at the tool or agent level. - Agentic Identity and Access Management (AIAM) #5921 -- Agentic Identity and Access Management (AIAM): identifies ten enterprise gaps including excessive permissions, missing audit trails, and inconsistent policy enforcement.
- Improved Tool Calling Context #5741 -- Improved Tool Calling Context: requests context propagation (user credentials, trace IDs) through the tool call stack.
Issue #5891 tackles the approval surface specifically but scopes it to a boolean gate. A GuardrailProvider generalizes this to support argument rewriting, structured denial reasons, audit metadata, and composable policy chains -- all concerns raised across the issues above.
Proposed Interface
from __future__ import annotations
from abc import abstractmethod
from dataclasses import dataclass, field
from enum import Enum
from typing import Any, Mapping, Protocol, Sequence, runtime_checkable
from autogen_core import CancellationToken
class Decision(Enum):
ALLOW = "allow"
DENY = "deny"
MODIFY = "modify"
@dataclass
class GuardrailResult:
"""Outcome of a guardrail evaluation."""
decision: Decision
reason: str | None = None
modified_args: Mapping[str, Any] | None = None # only when Decision.MODIFY
metadata: dict[str, Any] = field(default_factory=dict) # audit trail data
@runtime_checkable
class GuardrailProvider(Protocol):
"""Intercepts tool calls before execution for policy enforcement."""
@abstractmethod
async def evaluate(
self,
*,
tool_name: str,
args: Mapping[str, Any],
agent_name: str | None = None,
call_id: str | None = None,
cancellation_token: CancellationToken | None = None,
) -> GuardrailResult:
"""Evaluate whether a tool call should proceed.
Args:
tool_name: Name of the tool being invoked.
args: Arguments the agent wants to pass.
agent_name: Identity of the calling agent, if known.
call_id: Correlation ID for the tool call.
cancellation_token: For cooperative cancellation.
Returns:
GuardrailResult indicating allow, deny, or modify.
"""
...Integration Points
1. BaseTool.run_json() -- tool-level guard
Minimal change to run_json() in BaseTool:
async def run_json(
self,
args: Mapping[str, Any],
cancellation_token: CancellationToken,
call_id: str | None = None,
) -> Any:
effective_args = args
for provider in self._guardrail_providers:
result = await provider.evaluate(
tool_name=self._name,
args=effective_args,
call_id=call_id,
cancellation_token=cancellation_token,
)
if result.decision == Decision.DENY:
return f"Tool call denied: {result.reason or 'policy violation'}"
if result.decision == Decision.MODIFY and result.modified_args is not None:
effective_args = result.modified_args
validated = self._args_type.model_validate(effective_args)
return await self.run(validated, cancellation_token)2. Workbench.call_tool() -- workbench-level guard
For MCP and dynamic tool sources, guardrails can wrap call_tool() at the workbench layer, covering tools that do not subclass BaseTool.
3. AssistantAgent -- agent-level guard
Pass providers to AssistantAgent which forwards them to its tools, consistent with the pattern proposed in #5891 for approval_func.
Constructor Addition to BaseTool
def __init__(
self,
args_type: Type[ArgsT],
return_type: Type[ReturnT],
name: str,
description: str,
strict: bool = False,
guardrail_providers: Sequence[GuardrailProvider] = (), # new, optional
) -> None:Fully backward compatible -- existing tools and subclasses are unaffected.
Design Rationale
| Decision | Why |
|---|---|
| Protocol, not ABC | Matches AutoGen's use of runtime_checkable protocols; avoids forcing inheritance |
Decision enum with MODIFY |
Addresses #5891's open question about parameter modification vs. simple approval |
metadata on result |
Supports audit trail requirements from #5921 (AIAM) |
Keyword-only evaluate() args |
Future-proof; new fields can be added without breaking implementations |
| Composable chain | Multiple providers run in sequence; any DENY short-circuits |
Example: Rate-Limiting Provider
import time
from collections import defaultdict
class RateLimitGuardrail:
def __init__(self, max_calls: int = 10, window_seconds: float = 60.0):
self._max = max_calls
self._window = window_seconds
self._calls: dict[str, list[float]] = defaultdict(list)
async def evaluate(
self, *, tool_name, args, agent_name=None, call_id=None, cancellation_token=None
) -> GuardrailResult:
now = time.monotonic()
recent = [t for t in self._calls[tool_name] if now - t < self._window]
if len(recent) >= self._max:
return GuardrailResult(
decision=Decision.DENY,
reason=f"Rate limit: {self._max} calls per {self._window}s exceeded",
)
self._calls[tool_name] = [*recent, now]
return GuardrailResult(decision=Decision.ALLOW)Relationship to Existing Work
- Support Approval Func in BaseTool in AgentChat #5891 (approval_func): This proposal generalizes that design. An
approval_funccan be trivially wrapped as aGuardrailProvider. If maintainers prefer to land Support Approval Func in BaseTool in AgentChat #5891 first,GuardrailProvidercan layer on top. - Guardrails and Safety #6017 (Guardrails epic): This provides a concrete, minimal interface for the tool-call interception portion of the epic.
- Workbench (aka "the tool pool") design proposal #4721 (Workbench): Workbench's
call_tool()is a natural second integration point.
A reference implementation of policy-based tool guardrails using this interface pattern is available in the APort Agent Guardrails project.
Scope and Non-Goals
This proposal covers tool call interception only. It does not cover:
- Message/prompt scanning between agents (a separate middleware concern under Guardrails and Safety #6017)
- Identity federation or token delegation (covered by Agentic Identity and Access Management (AIAM) #5921 AIAM)
- Code execution sandboxing (covered by existing Docker/container patterns)
Next Steps
- Gather feedback on Protocol vs. ABC and the MODIFY decision variant.
- Decide whether to integrate with Support Approval Func in BaseTool in AgentChat #5891's
approval_funcor supersede it. - Prototype in
autogen-corewith tests againstFunctionToolandMcpWorkbench.