Skip to content

Proposal: GuardrailProvider protocol for tool call interception #7405

@uchibeke

Description

@uchibeke

Summary

Propose a GuardrailProvider protocol that intercepts tool calls before execution, enabling policy-based approval, audit logging, and argument sanitization. This plugs into the existing BaseTool.run_json() and Workbench.call_tool() paths without breaking backward compatibility.

Motivation

AutoGen currently has no standardized hook point between an agent deciding to call a tool and the tool executing. The community has raised this gap from multiple angles:

Issue #5891 tackles the approval surface specifically but scopes it to a boolean gate. A GuardrailProvider generalizes this to support argument rewriting, structured denial reasons, audit metadata, and composable policy chains -- all concerns raised across the issues above.

Proposed Interface

from __future__ import annotations

from abc import abstractmethod
from dataclasses import dataclass, field
from enum import Enum
from typing import Any, Mapping, Protocol, Sequence, runtime_checkable

from autogen_core import CancellationToken


class Decision(Enum):
    ALLOW = "allow"
    DENY = "deny"
    MODIFY = "modify"


@dataclass
class GuardrailResult:
    """Outcome of a guardrail evaluation."""

    decision: Decision
    reason: str | None = None
    modified_args: Mapping[str, Any] | None = None  # only when Decision.MODIFY
    metadata: dict[str, Any] = field(default_factory=dict)  # audit trail data


@runtime_checkable
class GuardrailProvider(Protocol):
    """Intercepts tool calls before execution for policy enforcement."""

    @abstractmethod
    async def evaluate(
        self,
        *,
        tool_name: str,
        args: Mapping[str, Any],
        agent_name: str | None = None,
        call_id: str | None = None,
        cancellation_token: CancellationToken | None = None,
    ) -> GuardrailResult:
        """Evaluate whether a tool call should proceed.

        Args:
            tool_name: Name of the tool being invoked.
            args: Arguments the agent wants to pass.
            agent_name: Identity of the calling agent, if known.
            call_id: Correlation ID for the tool call.
            cancellation_token: For cooperative cancellation.

        Returns:
            GuardrailResult indicating allow, deny, or modify.
        """
        ...

Integration Points

1. BaseTool.run_json() -- tool-level guard

Minimal change to run_json() in BaseTool:

async def run_json(
    self,
    args: Mapping[str, Any],
    cancellation_token: CancellationToken,
    call_id: str | None = None,
) -> Any:
    effective_args = args

    for provider in self._guardrail_providers:
        result = await provider.evaluate(
            tool_name=self._name,
            args=effective_args,
            call_id=call_id,
            cancellation_token=cancellation_token,
        )
        if result.decision == Decision.DENY:
            return f"Tool call denied: {result.reason or 'policy violation'}"
        if result.decision == Decision.MODIFY and result.modified_args is not None:
            effective_args = result.modified_args

    validated = self._args_type.model_validate(effective_args)
    return await self.run(validated, cancellation_token)

2. Workbench.call_tool() -- workbench-level guard

For MCP and dynamic tool sources, guardrails can wrap call_tool() at the workbench layer, covering tools that do not subclass BaseTool.

3. AssistantAgent -- agent-level guard

Pass providers to AssistantAgent which forwards them to its tools, consistent with the pattern proposed in #5891 for approval_func.

Constructor Addition to BaseTool

def __init__(
    self,
    args_type: Type[ArgsT],
    return_type: Type[ReturnT],
    name: str,
    description: str,
    strict: bool = False,
    guardrail_providers: Sequence[GuardrailProvider] = (),  # new, optional
) -> None:

Fully backward compatible -- existing tools and subclasses are unaffected.

Design Rationale

Decision Why
Protocol, not ABC Matches AutoGen's use of runtime_checkable protocols; avoids forcing inheritance
Decision enum with MODIFY Addresses #5891's open question about parameter modification vs. simple approval
metadata on result Supports audit trail requirements from #5921 (AIAM)
Keyword-only evaluate() args Future-proof; new fields can be added without breaking implementations
Composable chain Multiple providers run in sequence; any DENY short-circuits

Example: Rate-Limiting Provider

import time
from collections import defaultdict

class RateLimitGuardrail:
    def __init__(self, max_calls: int = 10, window_seconds: float = 60.0):
        self._max = max_calls
        self._window = window_seconds
        self._calls: dict[str, list[float]] = defaultdict(list)

    async def evaluate(
        self, *, tool_name, args, agent_name=None, call_id=None, cancellation_token=None
    ) -> GuardrailResult:
        now = time.monotonic()
        recent = [t for t in self._calls[tool_name] if now - t < self._window]
        if len(recent) >= self._max:
            return GuardrailResult(
                decision=Decision.DENY,
                reason=f"Rate limit: {self._max} calls per {self._window}s exceeded",
            )
        self._calls[tool_name] = [*recent, now]
        return GuardrailResult(decision=Decision.ALLOW)

Relationship to Existing Work

A reference implementation of policy-based tool guardrails using this interface pattern is available in the APort Agent Guardrails project.

Scope and Non-Goals

This proposal covers tool call interception only. It does not cover:

Next Steps

  1. Gather feedback on Protocol vs. ABC and the MODIFY decision variant.
  2. Decide whether to integrate with Support Approval Func in BaseTool in AgentChat #5891's approval_func or supersede it.
  3. Prototype in autogen-core with tests against FunctionTool and McpWorkbench.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions