-
Notifications
You must be signed in to change notification settings - Fork 201
Tibi holmes #5118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
theTibi
wants to merge
118
commits into
tibi-test
Choose a base branch
from
tibi-holmes
base: tibi-test
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Tibi holmes #5118
Changes from 72 commits
Commits
Show all changes
118 commits
Select commit
Hold shift + click to select a range
e9b7aa8
fix: docker menu
fabio-silva 63d562a
chore: update grafana packages
fabio-silva 5d334b6
fix: styles
fabio-silva 0c4f150
fix: API tests
fabio-silva a2d78ec
Merge branch 'v3' into PMM-14213-grafana-12.3.1
fabio-silva 9332a92
chore: remove comment
fabio-silva e6365b1
Update api-tests/management/nodes_test.go
fabio-silva eb03907
Merge branch 'v3' into PMM-14213-grafana-12.3.1
fabio-silva 5dc5882
fix: unit tests
fabio-silva df474b7
Merge branch 'PMM-14213-grafana-12.3.1' of https://github.com/percona…
fabio-silva 4a32950
Merge branch 'v3' into PMM-14213-grafana-12.3.1
fabio-silva 72d6022
Add ADRE (HolmesGPT) settings and HTTP client
theTibi e3c354d
Add ADRE (Autonomous Database Reliability Engineer) integration
theTibi 25b5419
Update ADRE settings and improve error handling
theTibi e8ba94f
Enhance ADRE functionality and improve error handling
theTibi b7a9924
Update ChatStream endpoint URL and improve error messages
theTibi ca601f7
Refactor ADRE handlers to integrate Grafana Alertmanager
theTibi bbe3ecb
Enhance ADRE client to support authentication and update README
theTibi 8f6f9cd
Enhance AdreAlertsPanel payload structure for investigations
theTibi 3560422
Enhance AdreChat functionality to support reasoning and improve data …
theTibi f2b15fc
Refactor AdrePage layout to enhance UI presentation
theTibi 092ead2
Implement ADRE streaming endpoint with extended timeout and enhance A…
theTibi 398db42
Enhance ADRE settings and chat functionality
theTibi bd96bce
Add ADRE_URL environment variable and enhance validation tests
theTibi 629d6a9
Refine DefaultInvestigationPrompt for clarity and usability
theTibi ac1eb15
Enhance ADRE settings and UI components for improved user experience
theTibi ee1368f
Remove redundant Stack component from AdrePage layout for cleaner UI …
theTibi de67142
Add custom tool integration and progress tracking to HolmesGPT
theTibi 862b0cd
Reapply "PMM-14843: hide sidebar on headless mode for renderer (#5077…
fabio-silva 3643ca7
Merge remote-tracking branch 'origin/tibi-test' into tibi-holmes
theTibi 429d4e9
Merge branch 'v3' into PMM-14213-grafana-12.3.1
fabio-silva 80dbee9
feat(investigations): phase1-ui — list + detail UI, basic blocks, com…
theTibi 3a3a89e
feat(investigations): phase2-orch — orchestrator service, Ollama clie…
theTibi 128a0ed
feat(investigations): phase2-chat — POST :id/chat, persist messages, …
theTibi 4fb9aaa
feat(investigations): phase2-run — Run investigation button + POST :i…
theTibi fa125ac
feat(investigations): phase3-holmes — holmes_investigate tool, adapte…
theTibi 042a5f8
feat(investigations): phase4 — block type constants and frontend bloc…
theTibi 575600f
feat(investigations): phase5 — GET :id/export/pdf HTML report, Export…
theTibi ea85d30
feat(investigations): phase6 — UX: status workflow, block reorder/del…
theTibi 62588ea
feat(investigations): add initial ADRs and data model for AI investig…
theTibi 377b27e
fix(investigations): update config structure in PanelBlock component …
theTibi 6844d15
feat(adre): add AI Assistant settings page and integrate with navigation
theTibi 6ae02e7
feat(adre): enhance settings and chat functionality with orchestrator…
theTibi a9abdb7
feat(adre): enhance PMM Agent functionality and settings management
theTibi 1cce307
feat(adre): improve chat client timeout handling and enhance conversa…
theTibi 9e2f507
feat(investigations): enhance investigation listing and settings mana…
theTibi e4eedfb
refactor(adre): simplify chat configuration and UI labels
theTibi bd20e4d
feat(investigations): enhance investigation report formatting and tim…
theTibi bb5d5c0
feat(investigations): ensure system role in conversation history for …
theTibi dee935c
feat(investigations): implement investigation deletion and related cl…
theTibi a1e9b10
feat(adre): enhance investigation reporting and alert management
theTibi 50b5634
Merge branch 'tibi-test' into tibi-holmes
theTibi 461f146
fix(investigations): update import path for Percona UI components
theTibi 599c38f
Merge branch 'tibi-test' into tibi-holmes
theTibi 36a1ef2
feat(grafana): add Grafana render API support and enhance UI integration
theTibi fcc61a9
Format Go code (struct alignment, imports, indentation)
theTibi 6a45881
Enhance settings tests and update Otel config tests
theTibi 55cb125
feat(grafana): enhance Grafana panel rendering and URL handling
theTibi fa6c741
Refactor AdrePage and AdreChatPanel components for improved UI consis…
theTibi ef5e7c6
feat(grafana): implement optional disk caching for Grafana render images
theTibi cec1724
feat(grafana): add support for Grafana panel rendering with caching
theTibi f726b1e
feat(adre): enhance link styling in AdreChatPanel for improved UI
theTibi ba00e2e
feat(adre): add alert metadata extraction utility and integrate into …
theTibi 718c0de
fix(investigation): refine alert metadata fetching logic in Investiga…
theTibi a4d5424
feat(investigation): enhance investigation export PDF styling and met…
theTibi 7af269e
feat(investigation): enhance investigation handling and metadata display
theTibi 2426e83
feat(prompts): enhance workload investigation guidance and recommenda…
theTibi 1ec0425
feat(qan): introduce QAN AI Insights feature and enhance settings man…
theTibi ed5162b
feat(adre): enhance Grafana panel rendering and link handling in Adre…
theTibi 9458f77
feat(investigation): enhance investigation creation and alert metadat…
theTibi d76670b
refactor(qan-header): remove AI Insights button and clean up imports
theTibi 53455b6
feat(settings): add ReplaceSystemPrompt option to settings and update…
theTibi b546aab
feat(adre): enhance workload and anomaly detection guidance in prompts
theTibi c592d30
feat(servicenow): integrate ServiceNow ticketing system into investig…
theTibi f8917b1
feat(adre): add Chip component to AdreSettingsPage for enhanced UI
theTibi 6f2ef3a
feat(investigation): add ServiceNow ticket number support and enhance…
theTibi 7874397
feat(adre): implement QAN insights caching and enhance API handling
theTibi 0e7711d
feat(adre): add PromptMaxBytes setting and enhance prompt validation
theTibi b6c1095
feat(adre): enhance Prometheus metric discovery and anomaly detection…
theTibi da2f014
fix(adre): refine scroll root reference in AdreChatWidget
theTibi 19cf9bb
feat(adre): enhance Grafana context handling in chat widget
theTibi 8bed343
feat(adre): update README and documentation for AI features integration
theTibi 4b83484
feat(adre): enhance Grafana context handling in chat requests
theTibi 1475828
feat(adre): implement leading system message for conversation history
theTibi ff8d40d
feat(adre): update ADRE settings and behavior controls
theTibi 11d415e
feat(adre): handle error events in chat stream and improve error mess…
theTibi 9de970d
feat(adre): add chat model settings and validation
theTibi 663104c
feat(adre): improve chat error handling and user feedback
theTibi 4aa6e34
feat(adre): enhance database parameter safety guidelines
theTibi b7006a6
feat(adre): add QAN Insights model configuration and validation
theTibi 121eea3
feat(adre): update default chat mode to investigation
theTibi 134ce5f
Merge branch 'tibi-test' into tibi-holmes
theTibi 2201220
Merge branch 'v3' into PMM-14213-grafana-12.3.1
fabio-silva 97d93a7
Merge branch 'v3' into PMM-14213-grafana-12.3.1
fabio-silva 979bb93
feat(adre): add ServiceNow integration for QAN insights
theTibi cf76a44
feat(adre): enhance QAN insights handling and introduce frontend tools
theTibi 3991c45
Merge branch 'tibi-test' into tibi-holmes
theTibi c6f70c5
Bump codecov/codecov-action from 5.5.2 to 6.0.0 (#5210)
dependabot[bot] c8246ba
Bump docker/login-action from 4.0.0 to 4.1.0 (#5211)
dependabot[bot] 75f106c
PMM-14856 Make multiline output of telemetry metrics (#5171)
maxkondr 4dc0e92
chore: bump grafana to v12.4.2
fabio-silva 9ead0eb
Merge branch 'v3' into PMM-14213-grafana-12.3.1
fabio-silva 224eaaf
fix: tests
fabio-silva df982c7
PMM-14951 PMM-14957 Dashboard variable sharing fix (#5177)
matejkubinec a0e36cd
Merge branch 'v3' into PMM-14213-grafana-12.3.1
fabio-silva 5faefa3
PMM-14825 Fix tests related to disabling telemetry in FB (#5213)
ademidoff 38c9b56
replace context.Background() with t.Context() in managed test files (…
Copilot 3bcc2e4
Merge branch 'tibi-test' into tibi-holmes
theTibi 4652505
fix: dependencies
fabio-silva 2c8cf07
Merge branch 'PMM-14213-grafana-12.3.1' of https://github.com/percona…
fabio-silva d2bcd60
Encryption - fix typos and improve clarity (#4705)
ademidoff 291c4f5
PMM-14940 Open external links in iframe in new tab (#5169)
matejkubinec 94b0e8a
Merge branch 'v3' into PMM-14213-grafana-12.3.1
ademidoff 81ff348
feat(kubernetes): add reference DaemonSet and README for coroot-node-…
theTibi 484f215
Merge branch 'PMM-14213-grafana-12.3.1' into tibi-holmes (Grafana 12.…
theTibi 641c841
feat(pmm-service-map): add new service map panel and related configur…
theTibi 027b570
Update PMM service map panel and related configurations
theTibi 873063c
feat(aiInsights): add AI Insights tab and related functionality
theTibi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,152 @@ | ||
| # Autonomous Database Reliability Engineer (ADRE) / HolmesGPT Integration | ||
|
|
||
| ADRE integrates [HolmesGPT](https://holmesgpt.dev) with PMM to provide AI-assisted database reliability analysis, chat, and alert investigation. | ||
|
|
||
| This branch targets **HolmesGPT 0.22+**: PMM uses **`POST /api/chat` only** (no `/api/investigate`), and tunes behaviour via **`behavior_controls`** in settings. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - HolmesGPT running in a container (or elsewhere) and reachable from the PMM server | ||
| - Optional: [mcp-clickhouse](https://github.com/ClickHouse/mcp-clickhouse) for ClickHouse/otel.logs/QAN analysis | ||
|
|
||
| ## Configuration | ||
|
|
||
| 1. Enable ADRE in **PMM Settings** (Configuration → Settings → Advanced) or on the ADRE / AI Assistant page (admin only). | ||
| 2. Set the **HolmesGPT base URL** to a reachable HTTPS (or HTTP in lab) origin, for example `https://holmes.example.internal` — **do not** commit real hosts or secrets to documentation. | ||
| 3. If HolmesGPT requires authentication, configure it through **PMM settings** (preferred) or follow HolmesGPT’s documented URL/header patterns. **Never** paste API keys, Grafana tokens, or passwords into public docs or chat logs. | ||
|
|
||
| HolmesGPT and PMM must be able to communicate. If using Docker or Kubernetes, ensure network policies and TLS match your security requirements. | ||
|
|
||
| ### Fast vs Investigation (`default_chat_mode`, `mode` on chat) | ||
|
|
||
| The ADRE panel and `POST /v1/adre/chat` use **Fast** (quick answers, minimal runbooks/TodoWrite by default) vs **Investigation** (full investigation behaviour). Differences are driven by Holmes **`behavior_controls`** maps stored in PMM settings (`behavior_controls_fast`, `behavior_controls_investigation`) plus separate **`additional_system_prompt`** texts (`chat_prompt`, `investigation_prompt`). See [Holmes fast mode / prompt controls](https://holmesgpt.dev/dev/reference/http-api/?h=fast#fast-mode--prompt-controls). | ||
|
|
||
| A third map, **`behavior_controls_format_report`**, applies only to the investigation report formatting pass. | ||
|
|
||
| **`adre_max_conversation_messages`** caps how many messages PMM sends as `conversation_history` to Holmes (mitigates context overflow when Holmes fails fast on oversized prompts). | ||
|
|
||
| **`ENABLED_PROMPTS` on the Holmes container** can override what the HTTP API is allowed to enable; if operators set it restrictively, PMM behaviour-control toggles may appear ineffective — document this next to AI Assistant settings for your environment. | ||
|
|
||
| Investigations and QAN insights call the Holmes client against **`Adre.URL`** only (no separate PMM Agent path). | ||
|
|
||
| ## HolmesGPT Configuration | ||
|
|
||
| Configure HolmesGPT to use PMM data sources: | ||
|
|
||
| - **Prometheus**: `https://<pmm-host>/victoriametrics/` (with auth if required) | ||
| - **Alertmanager**: `https://<pmm-host>/prometheus/alerts` (or internal URL if same network) | ||
|
|
||
| ## ClickHouse (Logs, QAN) | ||
|
|
||
| HolmesGPT has no built-in ClickHouse toolset. To enable log and QAN analysis: | ||
|
|
||
| 1. Run [mcp-clickhouse](https://github.com/ClickHouse/mcp-clickhouse) in a container | ||
| 2. Point it at PMM’s ClickHouse (host, port, user, password must be reachable from HolmesGPT) | ||
| 3. Add it as an MCP server in HolmesGPT config (streamable-http transport) | ||
| - Example: `url: "http://mcp-clickhouse:8000/mcp/messages"`, `mode: streamable-http` | ||
|
|
||
| PMM does not run or configure mcp-clickhouse; you manage it and HolmesGPT configuration yourself. | ||
|
|
||
| ## Adding custom tools to HolmesGPT | ||
|
|
||
| HolmesGPT supports two ways to add your own tools: | ||
|
|
||
| ### 1. Custom toolsets (YAML) | ||
|
|
||
| Define tools as shell commands in a `toolsets.yaml` file. Each tool has a `name`, `description`, and `command`; the LLM infers parameters from `{{ variable }}` placeholders. Use this for scripts, `curl` calls to APIs, or `kubectl`/CLI commands. | ||
|
|
||
| - **CLI:** `holmes ask "your question" --custom-toolsets=toolsets.yaml`; after editing run `holmes toolset refresh`. | ||
| - **Helm:** Configure under `holmes.customToolsets` in your values. | ||
|
|
||
| See [HolmesGPT Custom Toolsets](https://holmesgpt.dev/data-sources/custom-toolsets/). | ||
|
|
||
| ### 2. MCP servers (recommended for new integrations) | ||
|
|
||
| Implement an [MCP](https://modelcontextprotocol.io/) server that exposes tools; HolmesGPT connects to it and discovers tools dynamically. | ||
|
|
||
| - **Transport:** Prefer `streamable-http`: your server exposes an HTTP endpoint (e.g. `http://your-mcp:8000/mcp/messages`); HolmesGPT calls it with `mode: streamable-http`. | ||
| - **Config:** Add the server under `mcp_servers` in `~/.holmes/config.yaml` or in Helm under `holmes.mcp_servers`, with `config.url`, `config.mode`, optional `config.headers`, and `llm_instructions` (when/how the LLM should use it). | ||
|
|
||
| Example (config file): | ||
|
|
||
| ```yaml | ||
| mcp_servers: | ||
| my_tools: | ||
| description: "My custom PMM tools" | ||
| config: | ||
| url: "http://my-mcp-server:8000/mcp/messages" | ||
| mode: streamable-http | ||
| llm_instructions: "Use these tools for schema, EXPLAIN, and index inspection when investigating database issues." | ||
| ``` | ||
|
|
||
| If your MCP server runs inside or alongside PMM, ensure HolmesGPT can reach it (network, auth, and security as discussed earlier). | ||
|
|
||
| See [HolmesGPT MCP Servers](https://holmesgpt.dev/data-sources/remote-mcp-servers/). | ||
|
|
||
| ## Grafana context in ADRE Chat (PMM UI) | ||
|
|
||
| The PMM shell builds **structured Grafana context** when the user is on Grafana routes (`/graph/d/...`, `d-solo`, `explore`, etc.): normalized path, dashboard UID, `viewPanel` when present, `from`/`to`, `var-*` parameters, optional **document title** from the iframe. Implementation: `ui/apps/pmm/src/components/adre/grafana-context.ts` (fragment; `GrafanaProvider` supplies `grafanaDocumentTitle`). | ||
|
|
||
| The UI sends it as **`dashboard_context`** on `POST /v1/adre/chat`. **pmm-managed** appends it to Holmes **`additional_system_prompt`** (alongside the mode-specific prompt). | ||
|
|
||
| ## Holmes operator configuration (not shipped inside PMM) | ||
|
|
||
| PMM **does not** ship `holmes_config.yaml` or Markdown **runbooks** in the repository. Operators maintain them on the **HolmesGPT** deployment: | ||
|
|
||
| - **Toolsets** — Often defined in YAML (custom toolsets) or via **MCP** servers. Point Prometheus/VictoriaMetrics, PMM inventory tools, ClickHouse (QAN/logs), and optional `curl` tools at URLs reachable from Holmes (see [HolmesGPT docs](https://holmesgpt.dev)). | ||
| - **Runbooks** — Markdown files plus a **catalog** (e.g. `catalog.json`) so the `fetch_runbook` tool can load steps. Paths are configured in Holmes, not in PMM. | ||
| - **PMM-facing URLs** — Use a **browser-reachable** PMM base URL for markdown images and Grafana links where Holmes embeds `/v1/grafana/render` or `/graph/...`. | ||
|
|
||
| ## `GET /v1/grafana/render` (panel image proxy) | ||
|
|
||
| Served by **pmm-managed**. Used by Holmes toolsets or scripts to fetch a **PNG** of a dashboard panel or to return **JSON** with URLs for the PMM UI. | ||
|
|
||
| **Required query parameters:** `dashboard_uid`, `panel_id`, `from`, `to`. | ||
|
|
||
| **Common optional parameters:** `width`, `height`, `format=json` (returns JSON with `image_url` and `dashboard_url` instead of raw PNG), `cache=1` (optional **disk cache** under `/srv/pmm/grafana_render_cache` on the server), `tz`, and any `var-*` Grafana template variables needed for the dashboard (e.g. `var-service_id`). | ||
|
|
||
| **Validation:** `dashboard_uid` and `panel_id` must match safe character classes enforced by the handler. | ||
|
|
||
| **Auth:** Forwarding uses the caller’s `Authorization` header when calling Grafana’s render path. | ||
|
|
||
| For **end-user** documentation, panel-image behaviour is intentionally **not** expanded in MkDocs; this section is for **integrators**. | ||
|
|
||
| ## Grafana panel render and dashboard links (Holmes / tools) | ||
|
|
||
| When Holmes (or a tool) renders a Grafana panel image via PMM’s render API and includes an “Open in Grafana” link in the same message, follow this contract so the UI shows one correct link per panel: | ||
|
|
||
| 1. **Use the render tool’s `dashboard_url`.** When the render tool (e.g. calling PMM `GET /v1/grafana/render?format=json`) returns `image_url` and `dashboard_url`, the model must use that exact `dashboard_url` for any “Open in Grafana” (or “Open the … panel”) link in the same message as the panel image. Do not construct the dashboard link from other parameters or default time ranges; otherwise the link can have the wrong timeframe. | ||
|
|
||
| 2. **Match panel to narrative.** The panel id (and dashboard) used for the render must match what the model describes (e.g. if the answer says “QPS graph”, the rendered panel must be the QPS panel, not a different one like “MySQL Connections”). | ||
|
|
||
| 3. **Duplicate links are suppressed by PMM.** Duplicate “Open in Grafana” links in markdown are suppressed by the PMM UI when they refer to a panel that already has a render image in the message; the only link shown is the one under the image (with the correct timeframe). So one link per panel from the render tool response is enough. | ||
|
|
||
| ## API | ||
|
|
||
| PMM proxies requests to HolmesGPT where noted. Endpoints **require PMM authentication** unless stated otherwise. | ||
|
|
||
| | Method | Path | Description | | ||
| |--------|------|-------------| | ||
| | GET | /v1/adre/settings | Get ADRE settings (Holmes URL, `behavior_controls_*`, prompts, `adre_max_conversation_messages`, QAN prompt display fields, ServiceNow configured flag — no secrets in GET) | | ||
| | POST | /v1/adre/settings | Update ADRE settings (admin); may set `servicenow_url`, `servicenow_api_key`, `servicenow_client_token` — store securely | | ||
| | GET | /v1/adre/models | List available models from HolmesGPT when ADRE enabled | | ||
| | POST | /v1/adre/chat | Chat; `stream: true` for SSE streaming; optional `mode`: `fast` or `investigation` (legacy `chat` treated as `fast`); optional `dashboard_context` merged into Holmes `additional_system_prompt` | | ||
| | GET | /v1/adre/alerts | Firing alerts from Grafana Alertmanager (ADRE enabled) | | ||
| | POST | /v1/adre/qan-insights | Body: `service_id`, `query_text` (required); optional `query_id`, `fingerprint`, `time_from`, `time_to`, `force`. Returns analysis JSON; caches by `(query_id, service_id)` when `query_id` set | | ||
| | GET | /v1/adre/qan-insights | Query params: `query_id`, `service_id` — returns cached analysis or 404 | | ||
| | GET | /v1/grafana/render | Panel PNG or JSON (`format=json`); see section above | | ||
|
|
||
| **Investigations** live under `/v1/investigations/*` — see [dev/investigations/README.md](../investigations/README.md). | ||
|
|
||
| ### End-to-end flow (mermaid) | ||
|
|
||
| ```mermaid | ||
| sequenceDiagram | ||
| participant User as PMM_UI | ||
| participant PMM as pmm_managed | ||
| participant Holmes as HolmesGPT | ||
| User->>PMM: POST /v1/adre/chat | ||
| PMM->>Holmes: Chat API | ||
| Holmes-->>PMM: analysis stream | ||
| PMM-->>User: SSE or JSON | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,94 @@ | ||
| # PMM Investigations (developer / operator notes) | ||
|
|
||
| **Investigations** are persisted incident pages under `/v1/investigations` in **pmm-managed**. The UI lists investigations, shows block-based reports, supports chat, **Run investigation**, **PDF export**, and optional **ServiceNow** ticket creation. | ||
|
|
||
| This file is **not** part of the published Percona MkDocs site; it lives next to the Go sources for contributors and operators. | ||
|
|
||
| ## Architecture reference | ||
|
|
||
| - **ADR-001** — [0001-pmm-ai-investigations.md](../../documentation/docs/adr/0001-pmm-ai-investigations.md) (original orchestrator/Ollama narrative; see note below). | ||
| - **ADR-002** — [0002-investigations-data-model-and-api.md](../../documentation/docs/adr/0002-investigations-data-model-and-api.md) (data model and REST shape). | ||
|
|
||
| **Implementation note:** Investigation **chat** and **run** use **HolmesGPT** only (`adre.NewClient(settings.GetAdreURL())`): `POST /api/chat` with `investigation_prompt`, **`behavior_controls_investigation`**, and (for the formatting pass) **`behavior_controls_format_report`**. A separate Ollama orchestrator process is **not** required for that deployment model. ADR-001 remains historical context; align product docs with the code path you ship. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - **HolmesGPT URL** configured in PMM **AI Assistant / ADRE** settings (`GetAdreURL()` non-empty). Chat and run return HTTP 400 if missing. | ||
|
|
||
| ## REST API summary | ||
|
|
||
| All routes are prefixed with `/v1/investigations`. Authenticate like other PMM APIs. | ||
|
|
||
| | Method | Path pattern | Purpose | | ||
| | ------ | ------------ | ------- | | ||
| | GET | `/v1/investigations` | List investigations | | ||
| | POST | `/v1/investigations` | Create investigation | | ||
| | GET | `/v1/investigations/:id` | Get one | | ||
| | PATCH | `/v1/investigations/:id` | Update metadata / status | | ||
| | DELETE | `/v1/investigations/:id` | Delete | | ||
| | GET/POST | `/v1/investigations/:id/blocks` | List / create blocks | | ||
| | PATCH/DELETE | `/v1/investigations/:id/blocks/:blockId` | Update / delete block | | ||
| | GET/POST | `/v1/investigations/:id/timeline` | Timeline events | | ||
| | GET/POST | `/v1/investigations/:id/artifacts` | Artifacts | | ||
| | GET/POST | `/v1/investigations/:id/comments` | Comments | | ||
| | GET | `/v1/investigations/:id/messages` | Chat message history | | ||
| | POST | `/v1/investigations/:id/chat` | One chat round (Holmes `/api/chat`) | | ||
| | POST | `/v1/investigations/:id/run` | Start background **Run investigation** (202 Accepted) | | ||
| | GET | `/v1/investigations/:id/export/pdf` | Download PDF report | | ||
| | POST | `/v1/investigations/:id/servicenow` | Create ServiceNow ticket (requires settings) | | ||
|
|
||
| Details and JSON shapes: **ADR-002** and `managed/services/investigations/handlers.go`. | ||
|
|
||
| ## Chat flow (`POST .../chat`) | ||
|
|
||
| 1. Load investigation; validate Holmes URL. | ||
| 2. Persist the user `message`. | ||
| 3. Build `conversation_history` from stored messages (roles `user`, `assistant`, `tool`). | ||
| 4. Call `adre.Client.Chat` with investigation context, **`behavior_controls_investigation`**, and trimmed history (`adre_max_conversation_messages`). | ||
| 5. Persist assistant reply; return `{ "content": "..." }`. | ||
|
|
||
| ## Run investigation (`POST .../run`) | ||
|
|
||
| Returns **202** immediately; work continues in `runInvestigationBackground`: | ||
|
|
||
| 1. Calls Holmes **`Chat`** (`/api/chat`) with a structured ask, investigation prompt, context, and **`behavior_controls_investigation`**. | ||
| 2. **`FormatInvestigationReport`** — second LLM pass via `adre.Client.Chat` with **`behavior_controls_format_report`** to normalize markdown into JSON sections. | ||
| 4. **`ParseFormattedReport`** — creates **blocks** and **timeline** rows; updates investigation summary fields. | ||
|
|
||
| Timeouts: **5 minutes** for run and chat (see `investigationRunTimeout` / `investigationChatTimeout` in `chat.go`). | ||
|
|
||
| ## ServiceNow (`POST .../servicenow`) | ||
|
|
||
| Requires **non-empty** `Adre.ServiceNowURL`, `ServiceNowAPIKey`, and `ServiceNowClientToken` in PMM settings (set via `POST /v1/adre/settings`). The handler POSTs JSON to the configured create URL and sets header **`x-sn-apikey`** from the API key field. **Do not** log or document real values. | ||
|
|
||
| ## PDF export | ||
|
|
||
| `GET /v1/investigations/:id/export/pdf` returns an HTML-based report suitable for PDF conversion in the UI pipeline (see `managed/services/investigations/export.go`). | ||
|
|
||
| ## Related code | ||
|
|
||
| | Area | Path | | ||
| | ---- | ---- | | ||
| | HTTP dispatch | `managed/services/investigations/handlers.go` | | ||
| | Chat + run + background | `managed/services/investigations/chat.go` | | ||
| | ServiceNow | `managed/services/investigations/servicenow.go` | | ||
| | Report formatting | `managed/services/investigations/format_report.go` | | ||
| | Holmes client | `managed/services/adre/client.go` | | ||
|
|
||
| ## End-to-end sequence (mermaid) | ||
|
|
||
| ```mermaid | ||
| sequenceDiagram | ||
| participant UI as PMM_UI | ||
| participant PMM as pmm_managed | ||
| participant Holmes as HolmesGPT | ||
| UI->>PMM: POST /v1/investigations/:id/run | ||
| PMM-->>UI: 202 Accepted | ||
| PMM->>Holmes: Chat (/api/chat) | ||
| Holmes-->>PMM: analysis markdown | ||
| PMM->>Holmes: Format report (Chat) | ||
| Holmes-->>PMM: structured JSON | ||
| PMM->>PMM: Persist blocks and timeline | ||
| ``` | ||
|
|
||
| User-facing overview: [investigations.md](../../documentation/docs/use/ai-features/investigations.md). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,38 @@ | ||
| # ADR-001: PMM AI Investigations | ||
|
|
||
| ## Status | ||
|
|
||
| Accepted. | ||
|
|
||
| ## Context | ||
|
|
||
| PMM needs a first-class Investigations feature that combines: | ||
|
|
||
| - A configurable local LLM (Ollama by default) as the orchestrator for the user-facing chat. | ||
| - HolmesGPT as a tool the orchestrator can call for observability and database analysis. | ||
| - Persistent incident pages (reports) with blocks, comments, chat, and PDF export. | ||
| - Clear separation: normal chat is Q&A only; full investigation/report is triggered by an explicit "Run investigation" action and may involve a multi-turn loop between the orchestrator and HolmesGPT. | ||
|
|
||
| Existing ADRE (HolmesGPT) integration provides the HolmesGPT client and alerts; it does not provide persistent investigations, block-based reports, or orchestrator-driven routing. | ||
|
|
||
| ## Decision | ||
|
|
||
| - **Orchestrator**: Stateless service that receives investigation context and chat messages, calls a configurable LLM (Ollama default) with a tool registry. The LLM decides when to call HolmesGPT vs other tools vs answer directly (routing via tool definitions and system prompt). | ||
| - **Investigations API**: REST API under `/v1/investigations` for CRUD on investigations, blocks, timeline, artifacts, comments, and messages. `POST /v1/investigations/:id/chat` invokes the orchestrator; `POST /v1/investigations/:id/run` (or equivalent) runs the full multi-turn investigation loop. | ||
| - **Data model**: New tables for investigations, investigation_blocks, investigation_artifacts, investigation_messages, investigation_comments, investigation_timeline_events. Blocks are ordered and typed (summary, timeline, single_panel, panel_group, logs_view, query_result, finding, markdown, etc.); content varies per incident. | ||
| - **No backward compatibility**: Replace ADRE direct-chat/investigate UX with Investigations; remove or make internal-only endpoints that are no longer needed. | ||
| - **Config**: Orchestrator LLM configurable via env vars (`PMM_ORCHESTRATOR_LLM_PROVIDER`, `PMM_ORCHESTRATOR_LLM_URL`, `PMM_ORCHESTRATOR_LLM_MODEL`) and PMM settings (stored in extended Adre or dedicated settings section). | ||
|
|
||
| ## Consequences | ||
|
|
||
| - Single Incident Detail Page component; report content is data-driven (blocks from API). | ||
| - HolmesGPT is used as a tool; no change to HolmesGPT itself. | ||
| - Operators must run Ollama (or another configured LLM) for Investigations chat and "Run investigation" to work. | ||
|
|
||
| ## Implementation note (tibi-holmes / current tree) | ||
|
|
||
| The shipped UI includes **both** **ADRE Chat** (floating widget) and **Investigations**; ADRE direct chat was not removed. | ||
|
|
||
| Investigation **chat** and **run** are implemented against the configured **HolmesGPT** URL (`adre.Client`) via **`POST /api/chat`**, with prompts and **`behavior_controls`** from PMM settings — not a separate in-repo Ollama orchestrator service. See `managed/services/investigations/chat.go` and [dev/investigations/README.md](https://github.com/percona/pmm/blob/v3/dev/investigations/README.md) for the actual request flow. | ||
|
|
||
| End-user overview: [AI features — Investigations](../use/ai-features/investigations.md). | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [linkspector] reported by reviewdog 🐶
Cannot reach https://github.com/percona/pmm/blob/v3/dev/investigations/README.md Status: 404