Skip to content
Draft
Show file tree
Hide file tree
Changes from 72 commits
Commits
Show all changes
118 commits
Select commit Hold shift + click to select a range
e9b7aa8
fix: docker menu
fabio-silva Jan 22, 2026
63d562a
chore: update grafana packages
fabio-silva Jan 22, 2026
5d334b6
fix: styles
fabio-silva Jan 23, 2026
0c4f150
fix: API tests
fabio-silva Jan 23, 2026
a2d78ec
Merge branch 'v3' into PMM-14213-grafana-12.3.1
fabio-silva Jan 28, 2026
9332a92
chore: remove comment
fabio-silva Jan 28, 2026
e6365b1
Update api-tests/management/nodes_test.go
fabio-silva Jan 28, 2026
eb03907
Merge branch 'v3' into PMM-14213-grafana-12.3.1
fabio-silva Jan 28, 2026
5dc5882
fix: unit tests
fabio-silva Jan 28, 2026
df474b7
Merge branch 'PMM-14213-grafana-12.3.1' of https://github.com/percona…
fabio-silva Jan 28, 2026
4a32950
Merge branch 'v3' into PMM-14213-grafana-12.3.1
fabio-silva Feb 2, 2026
72d6022
Add ADRE (HolmesGPT) settings and HTTP client
theTibi Mar 6, 2026
e3c354d
Add ADRE (Autonomous Database Reliability Engineer) integration
theTibi Mar 6, 2026
25b5419
Update ADRE settings and improve error handling
theTibi Mar 7, 2026
e8ba94f
Enhance ADRE functionality and improve error handling
theTibi Mar 7, 2026
b7a9924
Update ChatStream endpoint URL and improve error messages
theTibi Mar 7, 2026
ca601f7
Refactor ADRE handlers to integrate Grafana Alertmanager
theTibi Mar 7, 2026
bbe3ecb
Enhance ADRE client to support authentication and update README
theTibi Mar 8, 2026
8f6f9cd
Enhance AdreAlertsPanel payload structure for investigations
theTibi Mar 8, 2026
3560422
Enhance AdreChat functionality to support reasoning and improve data …
theTibi Mar 8, 2026
f2b15fc
Refactor AdrePage layout to enhance UI presentation
theTibi Mar 8, 2026
092ead2
Implement ADRE streaming endpoint with extended timeout and enhance A…
theTibi Mar 8, 2026
398db42
Enhance ADRE settings and chat functionality
theTibi Mar 9, 2026
bd96bce
Add ADRE_URL environment variable and enhance validation tests
theTibi Mar 9, 2026
629d6a9
Refine DefaultInvestigationPrompt for clarity and usability
theTibi Mar 9, 2026
ac1eb15
Enhance ADRE settings and UI components for improved user experience
theTibi Mar 9, 2026
ee1368f
Remove redundant Stack component from AdrePage layout for cleaner UI …
theTibi Mar 9, 2026
de67142
Add custom tool integration and progress tracking to HolmesGPT
theTibi Mar 10, 2026
862b0cd
Reapply "PMM-14843: hide sidebar on headless mode for renderer (#5077…
fabio-silva Mar 10, 2026
3643ca7
Merge remote-tracking branch 'origin/tibi-test' into tibi-holmes
theTibi Mar 11, 2026
429d4e9
Merge branch 'v3' into PMM-14213-grafana-12.3.1
fabio-silva Mar 11, 2026
80dbee9
feat(investigations): phase1-ui — list + detail UI, basic blocks, com…
theTibi Mar 11, 2026
3a3a89e
feat(investigations): phase2-orch — orchestrator service, Ollama clie…
theTibi Mar 11, 2026
128a0ed
feat(investigations): phase2-chat — POST :id/chat, persist messages, …
theTibi Mar 11, 2026
4fb9aaa
feat(investigations): phase2-run — Run investigation button + POST :i…
theTibi Mar 11, 2026
fa125ac
feat(investigations): phase3-holmes — holmes_investigate tool, adapte…
theTibi Mar 11, 2026
042a5f8
feat(investigations): phase4 — block type constants and frontend bloc…
theTibi Mar 11, 2026
575600f
feat(investigations): phase5 — GET :id/export/pdf HTML report, Export…
theTibi Mar 11, 2026
ea85d30
feat(investigations): phase6 — UX: status workflow, block reorder/del…
theTibi Mar 11, 2026
62588ea
feat(investigations): add initial ADRs and data model for AI investig…
theTibi Mar 11, 2026
377b27e
fix(investigations): update config structure in PanelBlock component …
theTibi Mar 11, 2026
6844d15
feat(adre): add AI Assistant settings page and integrate with navigation
theTibi Mar 11, 2026
6ae02e7
feat(adre): enhance settings and chat functionality with orchestrator…
theTibi Mar 12, 2026
a9abdb7
feat(adre): enhance PMM Agent functionality and settings management
theTibi Mar 12, 2026
1cce307
feat(adre): improve chat client timeout handling and enhance conversa…
theTibi Mar 13, 2026
9e2f507
feat(investigations): enhance investigation listing and settings mana…
theTibi Mar 13, 2026
e4eedfb
refactor(adre): simplify chat configuration and UI labels
theTibi Mar 13, 2026
bd20e4d
feat(investigations): enhance investigation report formatting and tim…
theTibi Mar 14, 2026
bb5d5c0
feat(investigations): ensure system role in conversation history for …
theTibi Mar 14, 2026
dee935c
feat(investigations): implement investigation deletion and related cl…
theTibi Mar 14, 2026
a1e9b10
feat(adre): enhance investigation reporting and alert management
theTibi Mar 15, 2026
50b5634
Merge branch 'tibi-test' into tibi-holmes
theTibi Mar 16, 2026
461f146
fix(investigations): update import path for Percona UI components
theTibi Mar 16, 2026
599c38f
Merge branch 'tibi-test' into tibi-holmes
theTibi Mar 16, 2026
36a1ef2
feat(grafana): add Grafana render API support and enhance UI integration
theTibi Mar 16, 2026
fcc61a9
Format Go code (struct alignment, imports, indentation)
theTibi Mar 16, 2026
6a45881
Enhance settings tests and update Otel config tests
theTibi Mar 16, 2026
55cb125
feat(grafana): enhance Grafana panel rendering and URL handling
theTibi Mar 16, 2026
fa6c741
Refactor AdrePage and AdreChatPanel components for improved UI consis…
theTibi Mar 16, 2026
ef5e7c6
feat(grafana): implement optional disk caching for Grafana render images
theTibi Mar 16, 2026
cec1724
feat(grafana): add support for Grafana panel rendering with caching
theTibi Mar 17, 2026
f726b1e
feat(adre): enhance link styling in AdreChatPanel for improved UI
theTibi Mar 17, 2026
ba00e2e
feat(adre): add alert metadata extraction utility and integrate into …
theTibi Mar 17, 2026
718c0de
fix(investigation): refine alert metadata fetching logic in Investiga…
theTibi Mar 17, 2026
a4d5424
feat(investigation): enhance investigation export PDF styling and met…
theTibi Mar 17, 2026
7af269e
feat(investigation): enhance investigation handling and metadata display
theTibi Mar 17, 2026
2426e83
feat(prompts): enhance workload investigation guidance and recommenda…
theTibi Mar 17, 2026
1ec0425
feat(qan): introduce QAN AI Insights feature and enhance settings man…
theTibi Mar 18, 2026
ed5162b
feat(adre): enhance Grafana panel rendering and link handling in Adre…
theTibi Mar 18, 2026
9458f77
feat(investigation): enhance investigation creation and alert metadat…
theTibi Mar 18, 2026
d76670b
refactor(qan-header): remove AI Insights button and clean up imports
theTibi Mar 18, 2026
53455b6
feat(settings): add ReplaceSystemPrompt option to settings and update…
theTibi Mar 19, 2026
b546aab
feat(adre): enhance workload and anomaly detection guidance in prompts
theTibi Mar 19, 2026
c592d30
feat(servicenow): integrate ServiceNow ticketing system into investig…
theTibi Mar 19, 2026
f8917b1
feat(adre): add Chip component to AdreSettingsPage for enhanced UI
theTibi Mar 19, 2026
6f2ef3a
feat(investigation): add ServiceNow ticket number support and enhance…
theTibi Mar 19, 2026
7874397
feat(adre): implement QAN insights caching and enhance API handling
theTibi Mar 19, 2026
0e7711d
feat(adre): add PromptMaxBytes setting and enhance prompt validation
theTibi Mar 20, 2026
b6c1095
feat(adre): enhance Prometheus metric discovery and anomaly detection…
theTibi Mar 21, 2026
da2f014
fix(adre): refine scroll root reference in AdreChatWidget
theTibi Mar 22, 2026
19cf9bb
feat(adre): enhance Grafana context handling in chat widget
theTibi Mar 22, 2026
8bed343
feat(adre): update README and documentation for AI features integration
theTibi Mar 22, 2026
4b83484
feat(adre): enhance Grafana context handling in chat requests
theTibi Mar 22, 2026
1475828
feat(adre): implement leading system message for conversation history
theTibi Mar 22, 2026
ff8d40d
feat(adre): update ADRE settings and behavior controls
theTibi Mar 24, 2026
11d415e
feat(adre): handle error events in chat stream and improve error mess…
theTibi Mar 24, 2026
9de970d
feat(adre): add chat model settings and validation
theTibi Mar 25, 2026
663104c
feat(adre): improve chat error handling and user feedback
theTibi Mar 26, 2026
4aa6e34
feat(adre): enhance database parameter safety guidelines
theTibi Mar 26, 2026
b7006a6
feat(adre): add QAN Insights model configuration and validation
theTibi Mar 26, 2026
121eea3
feat(adre): update default chat mode to investigation
theTibi Mar 26, 2026
134ce5f
Merge branch 'tibi-test' into tibi-holmes
theTibi Mar 30, 2026
2201220
Merge branch 'v3' into PMM-14213-grafana-12.3.1
fabio-silva Mar 30, 2026
97d93a7
Merge branch 'v3' into PMM-14213-grafana-12.3.1
fabio-silva Mar 31, 2026
979bb93
feat(adre): add ServiceNow integration for QAN insights
theTibi Apr 2, 2026
cf76a44
feat(adre): enhance QAN insights handling and introduce frontend tools
theTibi Apr 2, 2026
3991c45
Merge branch 'tibi-test' into tibi-holmes
theTibi Apr 2, 2026
c6f70c5
Bump codecov/codecov-action from 5.5.2 to 6.0.0 (#5210)
dependabot[bot] Apr 3, 2026
c8246ba
Bump docker/login-action from 4.0.0 to 4.1.0 (#5211)
dependabot[bot] Apr 3, 2026
75f106c
PMM-14856 Make multiline output of telemetry metrics (#5171)
maxkondr Apr 6, 2026
4dc0e92
chore: bump grafana to v12.4.2
fabio-silva Apr 6, 2026
9ead0eb
Merge branch 'v3' into PMM-14213-grafana-12.3.1
fabio-silva Apr 6, 2026
224eaaf
fix: tests
fabio-silva Apr 6, 2026
df982c7
PMM-14951 PMM-14957 Dashboard variable sharing fix (#5177)
matejkubinec Apr 6, 2026
a0e36cd
Merge branch 'v3' into PMM-14213-grafana-12.3.1
fabio-silva Apr 6, 2026
5faefa3
PMM-14825 Fix tests related to disabling telemetry in FB (#5213)
ademidoff Apr 6, 2026
38c9b56
replace context.Background() with t.Context() in managed test files (…
Copilot Apr 7, 2026
3bcc2e4
Merge branch 'tibi-test' into tibi-holmes
theTibi Apr 7, 2026
4652505
fix: dependencies
fabio-silva Apr 7, 2026
2c8cf07
Merge branch 'PMM-14213-grafana-12.3.1' of https://github.com/percona…
fabio-silva Apr 7, 2026
d2bcd60
Encryption - fix typos and improve clarity (#4705)
ademidoff Apr 7, 2026
291c4f5
PMM-14940 Open external links in iframe in new tab (#5169)
matejkubinec Apr 8, 2026
94b0e8a
Merge branch 'v3' into PMM-14213-grafana-12.3.1
ademidoff Apr 8, 2026
81ff348
feat(kubernetes): add reference DaemonSet and README for coroot-node-…
theTibi Apr 8, 2026
484f215
Merge branch 'PMM-14213-grafana-12.3.1' into tibi-holmes (Grafana 12.…
theTibi Apr 8, 2026
641c841
feat(pmm-service-map): add new service map panel and related configur…
theTibi Apr 9, 2026
027b570
Update PMM service map panel and related configurations
theTibi Apr 9, 2026
873063c
feat(aiInsights): add AI Insights tab and related functionality
theTibi Apr 9, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions build/ansible/roles/nginx/files/conf.d/pmm.conf
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,23 @@
client_max_body_size 0;
}

# ADRE streaming endpoints - longer timeout for HolmesGPT investigate/chat
location /v1/adre/ {
proxy_pass http://managed-json/v1/adre/;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_read_timeout 600;
proxy_buffering off;
}

# Grafana panel render (can take 20–60s); longer read timeout, enable disk cache via cache=1
location /v1/grafana/render {
proxy_pass http://managed-json;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_read_timeout 120;
}

# pmm-managed JSON APIs
location /v1/ {
proxy_pass http://managed-json/v1/;
Expand Down
152 changes: 152 additions & 0 deletions dev/adre/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# Autonomous Database Reliability Engineer (ADRE) / HolmesGPT Integration

ADRE integrates [HolmesGPT](https://holmesgpt.dev) with PMM to provide AI-assisted database reliability analysis, chat, and alert investigation.

This branch targets **HolmesGPT 0.22+**: PMM uses **`POST /api/chat` only** (no `/api/investigate`), and tunes behaviour via **`behavior_controls`** in settings.

## Prerequisites

- HolmesGPT running in a container (or elsewhere) and reachable from the PMM server
- Optional: [mcp-clickhouse](https://github.com/ClickHouse/mcp-clickhouse) for ClickHouse/otel.logs/QAN analysis

## Configuration

1. Enable ADRE in **PMM Settings** (Configuration → Settings → Advanced) or on the ADRE / AI Assistant page (admin only).
2. Set the **HolmesGPT base URL** to a reachable HTTPS (or HTTP in lab) origin, for example `https://holmes.example.internal` — **do not** commit real hosts or secrets to documentation.
3. If HolmesGPT requires authentication, configure it through **PMM settings** (preferred) or follow HolmesGPT’s documented URL/header patterns. **Never** paste API keys, Grafana tokens, or passwords into public docs or chat logs.

HolmesGPT and PMM must be able to communicate. If using Docker or Kubernetes, ensure network policies and TLS match your security requirements.

### Fast vs Investigation (`default_chat_mode`, `mode` on chat)

The ADRE panel and `POST /v1/adre/chat` use **Fast** (quick answers, minimal runbooks/TodoWrite by default) vs **Investigation** (full investigation behaviour). Differences are driven by Holmes **`behavior_controls`** maps stored in PMM settings (`behavior_controls_fast`, `behavior_controls_investigation`) plus separate **`additional_system_prompt`** texts (`chat_prompt`, `investigation_prompt`). See [Holmes fast mode / prompt controls](https://holmesgpt.dev/dev/reference/http-api/?h=fast#fast-mode--prompt-controls).

A third map, **`behavior_controls_format_report`**, applies only to the investigation report formatting pass.

**`adre_max_conversation_messages`** caps how many messages PMM sends as `conversation_history` to Holmes (mitigates context overflow when Holmes fails fast on oversized prompts).

**`ENABLED_PROMPTS` on the Holmes container** can override what the HTTP API is allowed to enable; if operators set it restrictively, PMM behaviour-control toggles may appear ineffective — document this next to AI Assistant settings for your environment.

Investigations and QAN insights call the Holmes client against **`Adre.URL`** only (no separate PMM Agent path).

## HolmesGPT Configuration

Configure HolmesGPT to use PMM data sources:

- **Prometheus**: `https://<pmm-host>/victoriametrics/` (with auth if required)
- **Alertmanager**: `https://<pmm-host>/prometheus/alerts` (or internal URL if same network)

## ClickHouse (Logs, QAN)

HolmesGPT has no built-in ClickHouse toolset. To enable log and QAN analysis:

1. Run [mcp-clickhouse](https://github.com/ClickHouse/mcp-clickhouse) in a container
2. Point it at PMM’s ClickHouse (host, port, user, password must be reachable from HolmesGPT)
3. Add it as an MCP server in HolmesGPT config (streamable-http transport)
- Example: `url: "http://mcp-clickhouse:8000/mcp/messages"`, `mode: streamable-http`

PMM does not run or configure mcp-clickhouse; you manage it and HolmesGPT configuration yourself.

## Adding custom tools to HolmesGPT

HolmesGPT supports two ways to add your own tools:

### 1. Custom toolsets (YAML)

Define tools as shell commands in a `toolsets.yaml` file. Each tool has a `name`, `description`, and `command`; the LLM infers parameters from `{{ variable }}` placeholders. Use this for scripts, `curl` calls to APIs, or `kubectl`/CLI commands.

- **CLI:** `holmes ask "your question" --custom-toolsets=toolsets.yaml`; after editing run `holmes toolset refresh`.
- **Helm:** Configure under `holmes.customToolsets` in your values.

See [HolmesGPT Custom Toolsets](https://holmesgpt.dev/data-sources/custom-toolsets/).

### 2. MCP servers (recommended for new integrations)

Implement an [MCP](https://modelcontextprotocol.io/) server that exposes tools; HolmesGPT connects to it and discovers tools dynamically.

- **Transport:** Prefer `streamable-http`: your server exposes an HTTP endpoint (e.g. `http://your-mcp:8000/mcp/messages`); HolmesGPT calls it with `mode: streamable-http`.
- **Config:** Add the server under `mcp_servers` in `~/.holmes/config.yaml` or in Helm under `holmes.mcp_servers`, with `config.url`, `config.mode`, optional `config.headers`, and `llm_instructions` (when/how the LLM should use it).

Example (config file):

```yaml
mcp_servers:
my_tools:
description: "My custom PMM tools"
config:
url: "http://my-mcp-server:8000/mcp/messages"
mode: streamable-http
llm_instructions: "Use these tools for schema, EXPLAIN, and index inspection when investigating database issues."
```

If your MCP server runs inside or alongside PMM, ensure HolmesGPT can reach it (network, auth, and security as discussed earlier).

See [HolmesGPT MCP Servers](https://holmesgpt.dev/data-sources/remote-mcp-servers/).

## Grafana context in ADRE Chat (PMM UI)

The PMM shell builds **structured Grafana context** when the user is on Grafana routes (`/graph/d/...`, `d-solo`, `explore`, etc.): normalized path, dashboard UID, `viewPanel` when present, `from`/`to`, `var-*` parameters, optional **document title** from the iframe. Implementation: `ui/apps/pmm/src/components/adre/grafana-context.ts` (fragment; `GrafanaProvider` supplies `grafanaDocumentTitle`).

The UI sends it as **`dashboard_context`** on `POST /v1/adre/chat`. **pmm-managed** appends it to Holmes **`additional_system_prompt`** (alongside the mode-specific prompt).

## Holmes operator configuration (not shipped inside PMM)

PMM **does not** ship `holmes_config.yaml` or Markdown **runbooks** in the repository. Operators maintain them on the **HolmesGPT** deployment:

- **Toolsets** — Often defined in YAML (custom toolsets) or via **MCP** servers. Point Prometheus/VictoriaMetrics, PMM inventory tools, ClickHouse (QAN/logs), and optional `curl` tools at URLs reachable from Holmes (see [HolmesGPT docs](https://holmesgpt.dev)).
- **Runbooks** — Markdown files plus a **catalog** (e.g. `catalog.json`) so the `fetch_runbook` tool can load steps. Paths are configured in Holmes, not in PMM.
- **PMM-facing URLs** — Use a **browser-reachable** PMM base URL for markdown images and Grafana links where Holmes embeds `/v1/grafana/render` or `/graph/...`.

## `GET /v1/grafana/render` (panel image proxy)

Served by **pmm-managed**. Used by Holmes toolsets or scripts to fetch a **PNG** of a dashboard panel or to return **JSON** with URLs for the PMM UI.

**Required query parameters:** `dashboard_uid`, `panel_id`, `from`, `to`.

**Common optional parameters:** `width`, `height`, `format=json` (returns JSON with `image_url` and `dashboard_url` instead of raw PNG), `cache=1` (optional **disk cache** under `/srv/pmm/grafana_render_cache` on the server), `tz`, and any `var-*` Grafana template variables needed for the dashboard (e.g. `var-service_id`).

**Validation:** `dashboard_uid` and `panel_id` must match safe character classes enforced by the handler.

**Auth:** Forwarding uses the caller’s `Authorization` header when calling Grafana’s render path.

For **end-user** documentation, panel-image behaviour is intentionally **not** expanded in MkDocs; this section is for **integrators**.

## Grafana panel render and dashboard links (Holmes / tools)

When Holmes (or a tool) renders a Grafana panel image via PMM’s render API and includes an “Open in Grafana” link in the same message, follow this contract so the UI shows one correct link per panel:

1. **Use the render tool’s `dashboard_url`.** When the render tool (e.g. calling PMM `GET /v1/grafana/render?format=json`) returns `image_url` and `dashboard_url`, the model must use that exact `dashboard_url` for any “Open in Grafana” (or “Open the … panel”) link in the same message as the panel image. Do not construct the dashboard link from other parameters or default time ranges; otherwise the link can have the wrong timeframe.

2. **Match panel to narrative.** The panel id (and dashboard) used for the render must match what the model describes (e.g. if the answer says “QPS graph”, the rendered panel must be the QPS panel, not a different one like “MySQL Connections”).

3. **Duplicate links are suppressed by PMM.** Duplicate “Open in Grafana” links in markdown are suppressed by the PMM UI when they refer to a panel that already has a render image in the message; the only link shown is the one under the image (with the correct timeframe). So one link per panel from the render tool response is enough.

## API

PMM proxies requests to HolmesGPT where noted. Endpoints **require PMM authentication** unless stated otherwise.

| Method | Path | Description |
|--------|------|-------------|
| GET | /v1/adre/settings | Get ADRE settings (Holmes URL, `behavior_controls_*`, prompts, `adre_max_conversation_messages`, QAN prompt display fields, ServiceNow configured flag — no secrets in GET) |
| POST | /v1/adre/settings | Update ADRE settings (admin); may set `servicenow_url`, `servicenow_api_key`, `servicenow_client_token` — store securely |
| GET | /v1/adre/models | List available models from HolmesGPT when ADRE enabled |
| POST | /v1/adre/chat | Chat; `stream: true` for SSE streaming; optional `mode`: `fast` or `investigation` (legacy `chat` treated as `fast`); optional `dashboard_context` merged into Holmes `additional_system_prompt` |
| GET | /v1/adre/alerts | Firing alerts from Grafana Alertmanager (ADRE enabled) |
| POST | /v1/adre/qan-insights | Body: `service_id`, `query_text` (required); optional `query_id`, `fingerprint`, `time_from`, `time_to`, `force`. Returns analysis JSON; caches by `(query_id, service_id)` when `query_id` set |
| GET | /v1/adre/qan-insights | Query params: `query_id`, `service_id` — returns cached analysis or 404 |
| GET | /v1/grafana/render | Panel PNG or JSON (`format=json`); see section above |

**Investigations** live under `/v1/investigations/*` — see [dev/investigations/README.md](../investigations/README.md).

### End-to-end flow (mermaid)

```mermaid
sequenceDiagram
participant User as PMM_UI
participant PMM as pmm_managed
participant Holmes as HolmesGPT
User->>PMM: POST /v1/adre/chat
PMM->>Holmes: Chat API
Holmes-->>PMM: analysis stream
PMM-->>User: SSE or JSON
```
94 changes: 94 additions & 0 deletions dev/investigations/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# PMM Investigations (developer / operator notes)

**Investigations** are persisted incident pages under `/v1/investigations` in **pmm-managed**. The UI lists investigations, shows block-based reports, supports chat, **Run investigation**, **PDF export**, and optional **ServiceNow** ticket creation.

This file is **not** part of the published Percona MkDocs site; it lives next to the Go sources for contributors and operators.

## Architecture reference

- **ADR-001** — [0001-pmm-ai-investigations.md](../../documentation/docs/adr/0001-pmm-ai-investigations.md) (original orchestrator/Ollama narrative; see note below).
- **ADR-002** — [0002-investigations-data-model-and-api.md](../../documentation/docs/adr/0002-investigations-data-model-and-api.md) (data model and REST shape).

**Implementation note:** Investigation **chat** and **run** use **HolmesGPT** only (`adre.NewClient(settings.GetAdreURL())`): `POST /api/chat` with `investigation_prompt`, **`behavior_controls_investigation`**, and (for the formatting pass) **`behavior_controls_format_report`**. A separate Ollama orchestrator process is **not** required for that deployment model. ADR-001 remains historical context; align product docs with the code path you ship.

## Prerequisites

- **HolmesGPT URL** configured in PMM **AI Assistant / ADRE** settings (`GetAdreURL()` non-empty). Chat and run return HTTP 400 if missing.

## REST API summary

All routes are prefixed with `/v1/investigations`. Authenticate like other PMM APIs.

| Method | Path pattern | Purpose |
| ------ | ------------ | ------- |
| GET | `/v1/investigations` | List investigations |
| POST | `/v1/investigations` | Create investigation |
| GET | `/v1/investigations/:id` | Get one |
| PATCH | `/v1/investigations/:id` | Update metadata / status |
| DELETE | `/v1/investigations/:id` | Delete |
| GET/POST | `/v1/investigations/:id/blocks` | List / create blocks |
| PATCH/DELETE | `/v1/investigations/:id/blocks/:blockId` | Update / delete block |
| GET/POST | `/v1/investigations/:id/timeline` | Timeline events |
| GET/POST | `/v1/investigations/:id/artifacts` | Artifacts |
| GET/POST | `/v1/investigations/:id/comments` | Comments |
| GET | `/v1/investigations/:id/messages` | Chat message history |
| POST | `/v1/investigations/:id/chat` | One chat round (Holmes `/api/chat`) |
| POST | `/v1/investigations/:id/run` | Start background **Run investigation** (202 Accepted) |
| GET | `/v1/investigations/:id/export/pdf` | Download PDF report |
| POST | `/v1/investigations/:id/servicenow` | Create ServiceNow ticket (requires settings) |

Details and JSON shapes: **ADR-002** and `managed/services/investigations/handlers.go`.

## Chat flow (`POST .../chat`)

1. Load investigation; validate Holmes URL.
2. Persist the user `message`.
3. Build `conversation_history` from stored messages (roles `user`, `assistant`, `tool`).
4. Call `adre.Client.Chat` with investigation context, **`behavior_controls_investigation`**, and trimmed history (`adre_max_conversation_messages`).
5. Persist assistant reply; return `{ "content": "..." }`.

## Run investigation (`POST .../run`)

Returns **202** immediately; work continues in `runInvestigationBackground`:

1. Calls Holmes **`Chat`** (`/api/chat`) with a structured ask, investigation prompt, context, and **`behavior_controls_investigation`**.
2. **`FormatInvestigationReport`** — second LLM pass via `adre.Client.Chat` with **`behavior_controls_format_report`** to normalize markdown into JSON sections.
4. **`ParseFormattedReport`** — creates **blocks** and **timeline** rows; updates investigation summary fields.

Timeouts: **5 minutes** for run and chat (see `investigationRunTimeout` / `investigationChatTimeout` in `chat.go`).

## ServiceNow (`POST .../servicenow`)

Requires **non-empty** `Adre.ServiceNowURL`, `ServiceNowAPIKey`, and `ServiceNowClientToken` in PMM settings (set via `POST /v1/adre/settings`). The handler POSTs JSON to the configured create URL and sets header **`x-sn-apikey`** from the API key field. **Do not** log or document real values.

## PDF export

`GET /v1/investigations/:id/export/pdf` returns an HTML-based report suitable for PDF conversion in the UI pipeline (see `managed/services/investigations/export.go`).

## Related code

| Area | Path |
| ---- | ---- |
| HTTP dispatch | `managed/services/investigations/handlers.go` |
| Chat + run + background | `managed/services/investigations/chat.go` |
| ServiceNow | `managed/services/investigations/servicenow.go` |
| Report formatting | `managed/services/investigations/format_report.go` |
| Holmes client | `managed/services/adre/client.go` |

## End-to-end sequence (mermaid)

```mermaid
sequenceDiagram
participant UI as PMM_UI
participant PMM as pmm_managed
participant Holmes as HolmesGPT
UI->>PMM: POST /v1/investigations/:id/run
PMM-->>UI: 202 Accepted
PMM->>Holmes: Chat (/api/chat)
Holmes-->>PMM: analysis markdown
PMM->>Holmes: Format report (Chat)
Holmes-->>PMM: structured JSON
PMM->>PMM: Persist blocks and timeline
```

User-facing overview: [investigations.md](../../documentation/docs/use/ai-features/investigations.md).
38 changes: 38 additions & 0 deletions documentation/docs/adr/0001-pmm-ai-investigations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# ADR-001: PMM AI Investigations

## Status

Accepted.

## Context

PMM needs a first-class Investigations feature that combines:

- A configurable local LLM (Ollama by default) as the orchestrator for the user-facing chat.
- HolmesGPT as a tool the orchestrator can call for observability and database analysis.
- Persistent incident pages (reports) with blocks, comments, chat, and PDF export.
- Clear separation: normal chat is Q&A only; full investigation/report is triggered by an explicit "Run investigation" action and may involve a multi-turn loop between the orchestrator and HolmesGPT.

Existing ADRE (HolmesGPT) integration provides the HolmesGPT client and alerts; it does not provide persistent investigations, block-based reports, or orchestrator-driven routing.

## Decision

- **Orchestrator**: Stateless service that receives investigation context and chat messages, calls a configurable LLM (Ollama default) with a tool registry. The LLM decides when to call HolmesGPT vs other tools vs answer directly (routing via tool definitions and system prompt).
- **Investigations API**: REST API under `/v1/investigations` for CRUD on investigations, blocks, timeline, artifacts, comments, and messages. `POST /v1/investigations/:id/chat` invokes the orchestrator; `POST /v1/investigations/:id/run` (or equivalent) runs the full multi-turn investigation loop.
- **Data model**: New tables for investigations, investigation_blocks, investigation_artifacts, investigation_messages, investigation_comments, investigation_timeline_events. Blocks are ordered and typed (summary, timeline, single_panel, panel_group, logs_view, query_result, finding, markdown, etc.); content varies per incident.
- **No backward compatibility**: Replace ADRE direct-chat/investigate UX with Investigations; remove or make internal-only endpoints that are no longer needed.
- **Config**: Orchestrator LLM configurable via env vars (`PMM_ORCHESTRATOR_LLM_PROVIDER`, `PMM_ORCHESTRATOR_LLM_URL`, `PMM_ORCHESTRATOR_LLM_MODEL`) and PMM settings (stored in extended Adre or dedicated settings section).

## Consequences

- Single Incident Detail Page component; report content is data-driven (blocks from API).
- HolmesGPT is used as a tool; no change to HolmesGPT itself.
- Operators must run Ollama (or another configured LLM) for Investigations chat and "Run investigation" to work.

## Implementation note (tibi-holmes / current tree)

The shipped UI includes **both** **ADRE Chat** (floating widget) and **Investigations**; ADRE direct chat was not removed.

Investigation **chat** and **run** are implemented against the configured **HolmesGPT** URL (`adre.Client`) via **`POST /api/chat`**, with prompts and **`behavior_controls`** from PMM settings — not a separate in-repo Ollama orchestrator service. See `managed/services/investigations/chat.go` and [dev/investigations/README.md](https://github.com/percona/pmm/blob/v3/dev/investigations/README.md) for the actual request flow.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [linkspector] reported by reviewdog 🐶
Cannot reach https://github.com/percona/pmm/blob/v3/dev/investigations/README.md Status: 404


End-user overview: [AI features — Investigations](../use/ai-features/investigations.md).
Loading
Loading