percona · theTibi · Jan 22, 2026 · Jan 22, 2026 · Jan 23, 2026 · Jan 23, 2026
@@ -264,6 +264,23 @@
       client_max_body_size 0;
     }
 
+    # ADRE streaming endpoints - longer timeout for HolmesGPT investigate/chat
+    location /v1/adre/ {
+      proxy_pass http://managed-json/v1/adre/;
+      proxy_http_version 1.1;
+      proxy_set_header Connection "";
+      proxy_read_timeout 600;
+      proxy_buffering off;
+    }
+
+    # Grafana panel render (can take 20–60s); longer read timeout, enable disk cache via cache=1
+    location /v1/grafana/render {
+      proxy_pass http://managed-json;
+      proxy_http_version 1.1;
+      proxy_set_header Connection "";
+      proxy_read_timeout 120;
+    }
+
     # pmm-managed JSON APIs
     location /v1/ {
       proxy_pass http://managed-json/v1/;

@@ -0,0 +1,152 @@
+# Autonomous Database Reliability Engineer (ADRE) / HolmesGPT Integration
+
+ADRE integrates [HolmesGPT](https://holmesgpt.dev) with PMM to provide AI-assisted database reliability analysis, chat, and alert investigation.
+
+This branch targets **HolmesGPT 0.22+**: PMM uses **`POST /api/chat` only** (no `/api/investigate`), and tunes behaviour via **`behavior_controls`** in settings.
+
+## Prerequisites
+
+- HolmesGPT running in a container (or elsewhere) and reachable from the PMM server
+- Optional: [mcp-clickhouse](https://github.com/ClickHouse/mcp-clickhouse) for ClickHouse/otel.logs/QAN analysis
+
+## Configuration
+
+1. Enable ADRE in **PMM Settings** (Configuration → Settings → Advanced) or on the ADRE / AI Assistant page (admin only).
+2. Set the **HolmesGPT base URL** to a reachable HTTPS (or HTTP in lab) origin, for example `https://holmes.example.internal` — **do not** commit real hosts or secrets to documentation.
+3. If HolmesGPT requires authentication, configure it through **PMM settings** (preferred) or follow HolmesGPT’s documented URL/header patterns. **Never** paste API keys, Grafana tokens, or passwords into public docs or chat logs.
+
+HolmesGPT and PMM must be able to communicate. If using Docker or Kubernetes, ensure network policies and TLS match your security requirements.
+
+### Fast vs Investigation (`default_chat_mode`, `mode` on chat)
+
+The ADRE panel and `POST /v1/adre/chat` use **Fast** (quick answers, minimal runbooks/TodoWrite by default) vs **Investigation** (full investigation behaviour). Differences are driven by Holmes **`behavior_controls`** maps stored in PMM settings (`behavior_controls_fast`, `behavior_controls_investigation`) plus separate **`additional_system_prompt`** texts (`chat_prompt`, `investigation_prompt`). See [Holmes fast mode / prompt controls](https://holmesgpt.dev/dev/reference/http-api/?h=fast#fast-mode--prompt-controls).
+
+A third map, **`behavior_controls_format_report`**, applies only to the investigation report formatting pass.
+
+**`adre_max_conversation_messages`** caps how many messages PMM sends as `conversation_history` to Holmes (mitigates context overflow when Holmes fails fast on oversized prompts).
+
+**`ENABLED_PROMPTS` on the Holmes container** can override what the HTTP API is allowed to enable; if operators set it restrictively, PMM behaviour-control toggles may appear ineffective — document this next to AI Assistant settings for your environment.
+
+Investigations and QAN insights call the Holmes client against **`Adre.URL`** only (no separate PMM Agent path).
+
+## HolmesGPT Configuration
+
+Configure HolmesGPT to use PMM data sources:
+
+- **Prometheus**: `https://<pmm-host>/victoriametrics/` (with auth if required)
+- **Alertmanager**: `https://<pmm-host>/prometheus/alerts` (or internal URL if same network)
+
+## ClickHouse (Logs, QAN)
+
+HolmesGPT has no built-in ClickHouse toolset. To enable log and QAN analysis:
+
+1. Run [mcp-clickhouse](https://github.com/ClickHouse/mcp-clickhouse) in a container
+2. Point it at PMM’s ClickHouse (host, port, user, password must be reachable from HolmesGPT)
+3. Add it as an MCP server in HolmesGPT config (streamable-http transport)
+   - Example: `url: "http://mcp-clickhouse:8000/mcp/messages"`, `mode: streamable-http`
+
+PMM does not run or configure mcp-clickhouse; you manage it and HolmesGPT configuration yourself.
+
+## Adding custom tools to HolmesGPT
+
+HolmesGPT supports two ways to add your own tools:
+
+### 1. Custom toolsets (YAML)
+
+Define tools as shell commands in a `toolsets.yaml` file. Each tool has a `name`, `description`, and `command`; the LLM infers parameters from `{{ variable }}` placeholders. Use this for scripts, `curl` calls to APIs, or `kubectl`/CLI commands.
+
+- **CLI:** `holmes ask "your question" --custom-toolsets=toolsets.yaml`; after editing run `holmes toolset refresh`.
+- **Helm:** Configure under `holmes.customToolsets` in your values.
+
+See [HolmesGPT Custom Toolsets](https://holmesgpt.dev/data-sources/custom-toolsets/).
+
+### 2. MCP servers (recommended for new integrations)
+
+Implement an [MCP](https://modelcontextprotocol.io/) server that exposes tools; HolmesGPT connects to it and discovers tools dynamically.
+
+- **Transport:** Prefer `streamable-http`: your server exposes an HTTP endpoint (e.g. `http://your-mcp:8000/mcp/messages`); HolmesGPT calls it with `mode: streamable-http`.
+- **Config:** Add the server under `mcp_servers` in `~/.holmes/config.yaml` or in Helm under `holmes.mcp_servers`, with `config.url`, `config.mode`, optional `config.headers`, and `llm_instructions` (when/how the LLM should use it).
+
+Example (config file):
+
+```yaml
+mcp_servers:
+  my_tools:
+    description: "My custom PMM tools"
+    config:
+      url: "http://my-mcp-server:8000/mcp/messages"
+      mode: streamable-http
+    llm_instructions: "Use these tools for schema, EXPLAIN, and index inspection when investigating database issues."
+```
+
+If your MCP server runs inside or alongside PMM, ensure HolmesGPT can reach it (network, auth, and security as discussed earlier).
+
+See [HolmesGPT MCP Servers](https://holmesgpt.dev/data-sources/remote-mcp-servers/).
+
+## Grafana context in ADRE Chat (PMM UI)
+
+The PMM shell builds **structured Grafana context** when the user is on Grafana routes (`/graph/d/...`, `d-solo`, `explore`, etc.): normalized path, dashboard UID, `viewPanel` when present, `from`/`to`, `var-*` parameters, optional **document title** from the iframe. Implementation: `ui/apps/pmm/src/components/adre/grafana-context.ts` (fragment; `GrafanaProvider` supplies `grafanaDocumentTitle`).
+
+The UI sends it as **`dashboard_context`** on `POST /v1/adre/chat`. **pmm-managed** appends it to Holmes **`additional_system_prompt`** (alongside the mode-specific prompt).
+
+## Holmes operator configuration (not shipped inside PMM)
+
+PMM **does not** ship `holmes_config.yaml` or Markdown **runbooks** in the repository. Operators maintain them on the **HolmesGPT** deployment:
+
+- **Toolsets** — Often defined in YAML (custom toolsets) or via **MCP** servers. Point Prometheus/VictoriaMetrics, PMM inventory tools, ClickHouse (QAN/logs), and optional `curl` tools at URLs reachable from Holmes (see [HolmesGPT docs](https://holmesgpt.dev)).
+- **Runbooks** — Markdown files plus a **catalog** (e.g. `catalog.json`) so the `fetch_runbook` tool can load steps. Paths are configured in Holmes, not in PMM.
+- **PMM-facing URLs** — Use a **browser-reachable** PMM base URL for markdown images and Grafana links where Holmes embeds `/v1/grafana/render` or `/graph/...`.
+
+## `GET /v1/grafana/render` (panel image proxy)
+
+Served by **pmm-managed**. Used by Holmes toolsets or scripts to fetch a **PNG** of a dashboard panel or to return **JSON** with URLs for the PMM UI.
+
+**Required query parameters:** `dashboard_uid`, `panel_id`, `from`, `to`.
+
+**Common optional parameters:** `width`, `height`, `format=json` (returns JSON with `image_url` and `dashboard_url` instead of raw PNG), `cache=1` (optional **disk cache** under `/srv/pmm/grafana_render_cache` on the server), `tz`, and any `var-*` Grafana template variables needed for the dashboard (e.g. `var-service_id`).
+
+**Validation:** `dashboard_uid` and `panel_id` must match safe character classes enforced by the handler.
+
+**Auth:** Forwarding uses the caller’s `Authorization` header when calling Grafana’s render path.
+
+For **end-user** documentation, panel-image behaviour is intentionally **not** expanded in MkDocs; this section is for **integrators**.
+
+## Grafana panel render and dashboard links (Holmes / tools)
+
+When Holmes (or a tool) renders a Grafana panel image via PMM’s render API and includes an “Open in Grafana” link in the same message, follow this contract so the UI shows one correct link per panel:
+
+1. **Use the render tool’s `dashboard_url`.** When the render tool (e.g. calling PMM `GET /v1/grafana/render?format=json`) returns `image_url` and `dashboard_url`, the model must use that exact `dashboard_url` for any “Open in Grafana” (or “Open the … panel”) link in the same message as the panel image. Do not construct the dashboard link from other parameters or default time ranges; otherwise the link can have the wrong timeframe.
+
+2. **Match panel to narrative.** The panel id (and dashboard) used for the render must match what the model describes (e.g. if the answer says “QPS graph”, the rendered panel must be the QPS panel, not a different one like “MySQL Connections”).
+
+3. **Duplicate links are suppressed by PMM.** Duplicate “Open in Grafana” links in markdown are suppressed by the PMM UI when they refer to a panel that already has a render image in the message; the only link shown is the one under the image (with the correct timeframe). So one link per panel from the render tool response is enough.
+
+## API
+
+PMM proxies requests to HolmesGPT where noted. Endpoints **require PMM authentication** unless stated otherwise.
+
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | /v1/adre/settings | Get ADRE settings (Holmes URL, `behavior_controls_*`, prompts, `adre_max_conversation_messages`, QAN prompt display fields, ServiceNow configured flag — no secrets in GET) |
+| POST | /v1/adre/settings | Update ADRE settings (admin); may set `servicenow_url`, `servicenow_api_key`, `servicenow_client_token` — store securely |
+| GET | /v1/adre/models | List available models from HolmesGPT when ADRE enabled |
+| POST | /v1/adre/chat | Chat; `stream: true` for SSE streaming; optional `mode`: `fast` or `investigation` (legacy `chat` treated as `fast`); optional `dashboard_context` merged into Holmes `additional_system_prompt` |
+| GET | /v1/adre/alerts | Firing alerts from Grafana Alertmanager (ADRE enabled) |
+| POST | /v1/adre/qan-insights | Body: `service_id`, `query_text` (required); optional `query_id`, `fingerprint`, `time_from`, `time_to`, `force`. Returns analysis JSON; caches by `(query_id, service_id)` when `query_id` set |
+| GET | /v1/adre/qan-insights | Query params: `query_id`, `service_id` — returns cached analysis or 404 |
+| GET | /v1/grafana/render | Panel PNG or JSON (`format=json`); see section above |
+
+**Investigations** live under `/v1/investigations/*` — see [dev/investigations/README.md](../investigations/README.md).
+
+### End-to-end flow (mermaid)
+
+```mermaid
+sequenceDiagram
+  participant User as PMM_UI
+  participant PMM as pmm_managed
+  participant Holmes as HolmesGPT
+  User->>PMM: POST /v1/adre/chat
+  PMM->>Holmes: Chat API
+  Holmes-->>PMM: analysis stream
+  PMM-->>User: SSE or JSON
+```
@@ -0,0 +1,94 @@
+# PMM Investigations (developer / operator notes)
+
+**Investigations** are persisted incident pages under `/v1/investigations` in **pmm-managed**. The UI lists investigations, shows block-based reports, supports chat, **Run investigation**, **PDF export**, and optional **ServiceNow** ticket creation.
+
+This file is **not** part of the published Percona MkDocs site; it lives next to the Go sources for contributors and operators.
+
+## Architecture reference
+
+- **ADR-001** — [0001-pmm-ai-investigations.md](../../documentation/docs/adr/0001-pmm-ai-investigations.md) (original orchestrator/Ollama narrative; see note below).
+- **ADR-002** — [0002-investigations-data-model-and-api.md](../../documentation/docs/adr/0002-investigations-data-model-and-api.md) (data model and REST shape).
+
+**Implementation note:** Investigation **chat** and **run** use **HolmesGPT** only (`adre.NewClient(settings.GetAdreURL())`): `POST /api/chat` with `investigation_prompt`, **`behavior_controls_investigation`**, and (for the formatting pass) **`behavior_controls_format_report`**. A separate Ollama orchestrator process is **not** required for that deployment model. ADR-001 remains historical context; align product docs with the code path you ship.
+
+## Prerequisites
+
+- **HolmesGPT URL** configured in PMM **AI Assistant / ADRE** settings (`GetAdreURL()` non-empty). Chat and run return HTTP 400 if missing.
+
+## REST API summary
+
+All routes are prefixed with `/v1/investigations`. Authenticate like other PMM APIs.
+
+| Method | Path pattern | Purpose |
+| ------ | ------------ | ------- |
+| GET | `/v1/investigations` | List investigations |
+| POST | `/v1/investigations` | Create investigation |
+| GET | `/v1/investigations/:id` | Get one |
+| PATCH | `/v1/investigations/:id` | Update metadata / status |
+| DELETE | `/v1/investigations/:id` | Delete |
+| GET/POST | `/v1/investigations/:id/blocks` | List / create blocks |
+| PATCH/DELETE | `/v1/investigations/:id/blocks/:blockId` | Update / delete block |
+| GET/POST | `/v1/investigations/:id/timeline` | Timeline events |
+| GET/POST | `/v1/investigations/:id/artifacts` | Artifacts |
+| GET/POST | `/v1/investigations/:id/comments` | Comments |
+| GET | `/v1/investigations/:id/messages` | Chat message history |
+| POST | `/v1/investigations/:id/chat` | One chat round (Holmes `/api/chat`) |
+| POST | `/v1/investigations/:id/run` | Start background **Run investigation** (202 Accepted) |
+| GET | `/v1/investigations/:id/export/pdf` | Download PDF report |
+| POST | `/v1/investigations/:id/servicenow` | Create ServiceNow ticket (requires settings) |
+
+Details and JSON shapes: **ADR-002** and `managed/services/investigations/handlers.go`.
+
+## Chat flow (`POST .../chat`)
+
+1. Load investigation; validate Holmes URL.
+2. Persist the user `message`.
+3. Build `conversation_history` from stored messages (roles `user`, `assistant`, `tool`).
+4. Call `adre.Client.Chat` with investigation context, **`behavior_controls_investigation`**, and trimmed history (`adre_max_conversation_messages`).
+5. Persist assistant reply; return `{ "content": "..." }`.
+
+## Run investigation (`POST .../run`)
+
+Returns **202** immediately; work continues in `runInvestigationBackground`:
+
+1. Calls Holmes **`Chat`** (`/api/chat`) with a structured ask, investigation prompt, context, and **`behavior_controls_investigation`**.
+2. **`FormatInvestigationReport`** — second LLM pass via `adre.Client.Chat` with **`behavior_controls_format_report`** to normalize markdown into JSON sections.
+4. **`ParseFormattedReport`** — creates **blocks** and **timeline** rows; updates investigation summary fields.
+
+Timeouts: **5 minutes** for run and chat (see `investigationRunTimeout` / `investigationChatTimeout` in `chat.go`).
+
+## ServiceNow (`POST .../servicenow`)
+
+Requires **non-empty** `Adre.ServiceNowURL`, `ServiceNowAPIKey`, and `ServiceNowClientToken` in PMM settings (set via `POST /v1/adre/settings`). The handler POSTs JSON to the configured create URL and sets header **`x-sn-apikey`** from the API key field. **Do not** log or document real values.
+
+## PDF export
+
+`GET /v1/investigations/:id/export/pdf` returns an HTML-based report suitable for PDF conversion in the UI pipeline (see `managed/services/investigations/export.go`).
+
+## Related code
+
+| Area | Path |
+| ---- | ---- |
+| HTTP dispatch | `managed/services/investigations/handlers.go` |
+| Chat + run + background | `managed/services/investigations/chat.go` |
+| ServiceNow | `managed/services/investigations/servicenow.go` |
+| Report formatting | `managed/services/investigations/format_report.go` |
+| Holmes client | `managed/services/adre/client.go` |
+
+## End-to-end sequence (mermaid)
+
+```mermaid
+sequenceDiagram
+  participant UI as PMM_UI
+  participant PMM as pmm_managed
+  participant Holmes as HolmesGPT
+  UI->>PMM: POST /v1/investigations/:id/run
+  PMM-->>UI: 202 Accepted
+  PMM->>Holmes: Chat (/api/chat)
+  Holmes-->>PMM: analysis markdown
+  PMM->>Holmes: Format report (Chat)
+  Holmes-->>PMM: structured JSON
+  PMM->>PMM: Persist blocks and timeline
+```
+
+User-facing overview: [investigations.md](../../documentation/docs/use/ai-features/investigations.md).
@@ -0,0 +1,38 @@
+# ADR-001: PMM AI Investigations
+
+## Status
+
+Accepted.
+
+## Context
+
+PMM needs a first-class Investigations feature that combines:
+
+- A configurable local LLM (Ollama by default) as the orchestrator for the user-facing chat.
+- HolmesGPT as a tool the orchestrator can call for observability and database analysis.
+- Persistent incident pages (reports) with blocks, comments, chat, and PDF export.
+- Clear separation: normal chat is Q&A only; full investigation/report is triggered by an explicit "Run investigation" action and may involve a multi-turn loop between the orchestrator and HolmesGPT.
+
+Existing ADRE (HolmesGPT) integration provides the HolmesGPT client and alerts; it does not provide persistent investigations, block-based reports, or orchestrator-driven routing.
+
+## Decision
+
+- **Orchestrator**: Stateless service that receives investigation context and chat messages, calls a configurable LLM (Ollama default) with a tool registry. The LLM decides when to call HolmesGPT vs other tools vs answer directly (routing via tool definitions and system prompt).
+- **Investigations API**: REST API under `/v1/investigations` for CRUD on investigations, blocks, timeline, artifacts, comments, and messages. `POST /v1/investigations/:id/chat` invokes the orchestrator; `POST /v1/investigations/:id/run` (or equivalent) runs the full multi-turn investigation loop.
+- **Data model**: New tables for investigations, investigation_blocks, investigation_artifacts, investigation_messages, investigation_comments, investigation_timeline_events. Blocks are ordered and typed (summary, timeline, single_panel, panel_group, logs_view, query_result, finding, markdown, etc.); content varies per incident.
+- **No backward compatibility**: Replace ADRE direct-chat/investigate UX with Investigations; remove or make internal-only endpoints that are no longer needed.
+- **Config**: Orchestrator LLM configurable via env vars (`PMM_ORCHESTRATOR_LLM_PROVIDER`, `PMM_ORCHESTRATOR_LLM_URL`, `PMM_ORCHESTRATOR_LLM_MODEL`) and PMM settings (stored in extended Adre or dedicated settings section).
+
+## Consequences
+
+- Single Incident Detail Page component; report content is data-driven (blocks from API).
+- HolmesGPT is used as a tool; no change to HolmesGPT itself.
+- Operators must run Ollama (or another configured LLM) for Investigations chat and "Run investigation" to work.
+
+## Implementation note (tibi-holmes / current tree)
+
+The shipped UI includes **both** **ADRE Chat** (floating widget) and **Investigations**; ADRE direct chat was not removed.
+
+Investigation **chat** and **run** are implemented against the configured **HolmesGPT** URL (`adre.Client`) via **`POST /api/chat`**, with prompts and **`behavior_controls`** from PMM settings — not a separate in-repo Ollama orchestrator service. See `managed/services/investigations/chat.go` and [dev/investigations/README.md](https://github.com/percona/pmm/blob/v3/dev/investigations/README.md) for the actual request flow.
+
+End-user overview: [AI features — Investigations](../use/ai-features/investigations.md).