Proposal: GuardrailProvider protocol for tool call interception

## Summary

Propose a `GuardrailProvider` protocol that intercepts tool calls before execution, enabling policy-based approval, audit logging, and argument sanitization. This plugs into the existing `BaseTool.run_json()` and `Workbench.call_tool()` paths without breaking backward compatibility.

## Motivation

AutoGen currently has no standardized hook point between an agent deciding to call a tool and the tool executing. The community has raised this gap from multiple angles:

- **#6017** -- Guardrails and Safety epic: comments call for "a scanning layer that inspects messages between agents" and auditing tool calls at agent boundaries.
- **#5891** -- Support Approval Func in BaseTool: proposes an `approval_func` parameter on `BaseTool`, with open design questions about whether approval belongs at the tool or agent level.
- **#5921** -- Agentic Identity and Access Management (AIAM): identifies ten enterprise gaps including excessive permissions, missing audit trails, and inconsistent policy enforcement.
- **#5741** -- Improved Tool Calling Context: requests context propagation (user credentials, trace IDs) through the tool call stack.

Issue #5891 tackles the approval surface specifically but scopes it to a boolean gate. A `GuardrailProvider` generalizes this to support argument rewriting, structured denial reasons, audit metadata, and composable policy chains -- all concerns raised across the issues above.

## Proposed Interface

```python
from __future__ import annotations

from abc import abstractmethod
from dataclasses import dataclass, field
from enum import Enum
from typing import Any, Mapping, Protocol, Sequence, runtime_checkable

from autogen_core import CancellationToken


class Decision(Enum):
    ALLOW = "allow"
    DENY = "deny"
    MODIFY = "modify"


@dataclass
class GuardrailResult:
    """Outcome of a guardrail evaluation."""

    decision: Decision
    reason: str | None = None
    modified_args: Mapping[str, Any] | None = None  # only when Decision.MODIFY
    metadata: dict[str, Any] = field(default_factory=dict)  # audit trail data


@runtime_checkable
class GuardrailProvider(Protocol):
    """Intercepts tool calls before execution for policy enforcement."""

    @abstractmethod
    async def evaluate(
        self,
        *,
        tool_name: str,
        args: Mapping[str, Any],
        agent_name: str | None = None,
        call_id: str | None = None,
        cancellation_token: CancellationToken | None = None,
    ) -> GuardrailResult:
        """Evaluate whether a tool call should proceed.

        Args:
            tool_name: Name of the tool being invoked.
            args: Arguments the agent wants to pass.
            agent_name: Identity of the calling agent, if known.
            call_id: Correlation ID for the tool call.
            cancellation_token: For cooperative cancellation.

        Returns:
            GuardrailResult indicating allow, deny, or modify.
        """
        ...
```

### Integration Points

**1. BaseTool.run_json() -- tool-level guard**

Minimal change to `run_json()` in `BaseTool`:

```python
async def run_json(
    self,
    args: Mapping[str, Any],
    cancellation_token: CancellationToken,
    call_id: str | None = None,
) -> Any:
    effective_args = args

    for provider in self._guardrail_providers:
        result = await provider.evaluate(
            tool_name=self._name,
            args=effective_args,
            call_id=call_id,
            cancellation_token=cancellation_token,
        )
        if result.decision == Decision.DENY:
            return f"Tool call denied: {result.reason or 'policy violation'}"
        if result.decision == Decision.MODIFY and result.modified_args is not None:
            effective_args = result.modified_args

    validated = self._args_type.model_validate(effective_args)
    return await self.run(validated, cancellation_token)
```

**2. Workbench.call_tool() -- workbench-level guard**

For MCP and dynamic tool sources, guardrails can wrap `call_tool()` at the workbench layer, covering tools that do not subclass `BaseTool`.

**3. AssistantAgent -- agent-level guard**

Pass providers to `AssistantAgent` which forwards them to its tools, consistent with the pattern proposed in #5891 for `approval_func`.

### Constructor Addition to BaseTool

```python
def __init__(
    self,
    args_type: Type[ArgsT],
    return_type: Type[ReturnT],
    name: str,
    description: str,
    strict: bool = False,
    guardrail_providers: Sequence[GuardrailProvider] = (),  # new, optional
) -> None:
```

Fully backward compatible -- existing tools and subclasses are unaffected.

## Design Rationale

| Decision | Why |
|---|---|
| Protocol, not ABC | Matches AutoGen's use of `runtime_checkable` protocols; avoids forcing inheritance |
| `Decision` enum with MODIFY | Addresses #5891's open question about parameter modification vs. simple approval |
| `metadata` on result | Supports audit trail requirements from #5921 (AIAM) |
| Keyword-only `evaluate()` args | Future-proof; new fields can be added without breaking implementations |
| Composable chain | Multiple providers run in sequence; any DENY short-circuits |

## Example: Rate-Limiting Provider

```python
import time
from collections import defaultdict

class RateLimitGuardrail:
    def __init__(self, max_calls: int = 10, window_seconds: float = 60.0):
        self._max = max_calls
        self._window = window_seconds
        self._calls: dict[str, list[float]] = defaultdict(list)

    async def evaluate(
        self, *, tool_name, args, agent_name=None, call_id=None, cancellation_token=None
    ) -> GuardrailResult:
        now = time.monotonic()
        recent = [t for t in self._calls[tool_name] if now - t < self._window]
        if len(recent) >= self._max:
            return GuardrailResult(
                decision=Decision.DENY,
                reason=f"Rate limit: {self._max} calls per {self._window}s exceeded",
            )
        self._calls[tool_name] = [*recent, now]
        return GuardrailResult(decision=Decision.ALLOW)
```

## Relationship to Existing Work

- **#5891 (approval_func):** This proposal generalizes that design. An `approval_func` can be trivially wrapped as a `GuardrailProvider`. If maintainers prefer to land #5891 first, `GuardrailProvider` can layer on top.
- **#6017 (Guardrails epic):** This provides a concrete, minimal interface for the tool-call interception portion of the epic.
- **#4721 (Workbench):** Workbench's `call_tool()` is a natural second integration point.

A reference implementation of policy-based tool guardrails using this interface pattern is available in the [APort Agent Guardrails](https://github.com/aport-platform/aport-agent-guardrails) project.

## Scope and Non-Goals

This proposal covers **tool call interception only**. It does not cover:

- Message/prompt scanning between agents (a separate middleware concern under #6017)
- Identity federation or token delegation (covered by #5921 AIAM)
- Code execution sandboxing (covered by existing Docker/container patterns)

## Next Steps

1. Gather feedback on Protocol vs. ABC and the MODIFY decision variant.
2. Decide whether to integrate with #5891's `approval_func` or supersede it.
3. Prototype in `autogen-core` with tests against `FunctionTool` and `McpWorkbench`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: GuardrailProvider protocol for tool call interception #7405

Summary

Motivation

Proposed Interface

Integration Points

Constructor Addition to BaseTool

Design Rationale

Example: Rate-Limiting Provider

Relationship to Existing Work

Scope and Non-Goals

Next Steps

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Decision	Why
Protocol, not ABC	Matches AutoGen's use of `runtime_checkable` protocols; avoids forcing inheritance
`Decision` enum with MODIFY	Addresses #5891's open question about parameter modification vs. simple approval
`metadata` on result	Supports audit trail requirements from #5921 (AIAM)
Keyword-only `evaluate()` args	Future-proof; new fields can be added without breaking implementations
Composable chain	Multiple providers run in sequence; any DENY short-circuits

Proposal: GuardrailProvider protocol for tool call interception #7405

Description

Summary

Motivation

Proposed Interface

Integration Points

Constructor Addition to BaseTool

Design Rationale

Example: Rate-Limiting Provider

Relationship to Existing Work

Scope and Non-Goals

Next Steps

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions