diff --git a/aws-observability/POWER.md b/aws-observability/POWER.md index e0f1d6c..8c3b255 100644 --- a/aws-observability/POWER.md +++ b/aws-observability/POWER.md @@ -1,8 +1,8 @@ --- name: "aws-observability" displayName: "AWS Observability" -description: "Comprehensive AWS observability platform combining CloudWatch Logs, Metrics, Alarms, Application Signals (APM), CloudTrail security auditing, and automated codebase observability gap analysis, for complete monitoring, troubleshooting, and optimization." -keywords: ["cloudwatch", "logs", "metrics", "traces", "alarms", "alerts", "monitoring", "observability", "application signals", "apm", "distributed tracing", "x-ray", "opentelemetry", "otel", "slow", "latency", "performance", "bottleneck", "degradation", "timeout", "high latency", "slow api", "api performance", "service performance", "response time", "p50", "p90", "p95", "p99", "errors", "error rate", "fault rate", "failure rate", "5xx", "4xx", "exceptions", "availability", "uptime", "downtime", "outage", "sev1", "sev2", "slo", "sli", "service level", "error budget", "breach", "troubleshooting", "root cause", "rca", "investigate", "diagnose", "log analysis", "log insights", "log query", "log patterns", "audit", "cloudtrail", "security audit", "access logs", "iam changes", "change events", "service map", "cascading failure", "canary", "synthetic monitoring", "health check", "observability gaps", "missing instrumentation", "monitoring instrumentation", "structured logging", "silent failures", "logging gaps", "alarm investigation", "trace analysis", "span analysis", "request tracing"] +description: "Comprehensive AWS observability platform combining CloudWatch Logs, Metrics, Alarms, Application Signals (APM), CloudTrail security auditing, Amazon Managed Prometheus (AMP) metric querying, and automated codebase observability gap analysis, for complete monitoring, troubleshooting, and optimization." +keywords: ["cloudwatch", "logs", "metrics", "traces", "alarms", "alerts", "monitoring", "observability", "application signals", "apm", "distributed tracing", "x-ray", "opentelemetry", "otel", "slow", "latency", "performance", "bottleneck", "degradation", "timeout", "high latency", "slow api", "api performance", "service performance", "response time", "p50", "p90", "p95", "p99", "errors", "error rate", "fault rate", "failure rate", "5xx", "4xx", "exceptions", "availability", "uptime", "downtime", "outage", "sev1", "sev2", "slo", "sli", "service level", "error budget", "breach", "troubleshooting", "root cause", "rca", "investigate", "diagnose", "log analysis", "log insights", "log query", "log patterns", "audit", "cloudtrail", "security audit", "access logs", "iam changes", "change events", "service map", "cascading failure", "canary", "synthetic monitoring", "health check", "observability gaps", "missing instrumentation", "monitoring instrumentation", "structured logging", "silent failures", "logging gaps", "alarm investigation", "trace analysis", "span analysis", "request tracing", "prometheus", "promql", "amp", "amazon managed prometheus", "prometheus metrics", "prometheus query", "prometheus workspace"] author: "AWS" --- @@ -58,6 +58,13 @@ author: "AWS" - "audit codebase", "check instrumentation", "observability gaps" - "missing logs", "improve observability" +### 📈 Prometheus / AMP Metrics → `prometheus-metrics.md` + +**Load when user mentions:** +- "prometheus", "promql", "AMP", "managed prometheus" +- "prometheus metrics", "prometheus query", "prometheus workspace" +- "PromQL range query", "list metrics" + ### ⚙️ Application Signals Setup → `application-signals-setup.md` **Load when user mentions:** @@ -115,6 +122,7 @@ The comprehensive AWS observability platform combining monitoring, troubleshooti - **CloudWatch Logs** - Query and analyze logs using CloudWatch Logs Insights - **Metrics & Alarms** - Metric querying with Metrics Insights and intelligent alarm recommendations - **Application Signals** - APM with distributed tracing, service maps, SLOs, and enablement guides +- **Amazon Managed Prometheus** - PromQL queries against AMP workspaces with SigV4 authentication - **Codebase Observability Analysis** - Automated analysis of codebases to identify observability gaps - **CloudTrail Integration** - Security auditing and compliance tracking - **AWS Documentation** - Direct access to official AWS docs for troubleshooting @@ -214,7 +222,25 @@ See `cloudtrail-data-source-selection.md` steering file for detailed decision tr - Detecting unauthorized access attempts - Root cause analysis for configuration changes -### 5. Codebase Observability Analysis +### 5. Amazon Managed Prometheus (AMP) + +**Primary Use Case**: Query and analyze Prometheus metrics from AWS Managed Prometheus workspaces + +**Key Features**: +- Execute instant and range PromQL queries +- List available metrics in a workspace +- Discover and manage AMP workspaces +- AWS SigV4 authentication for secure access +- Retrieve server configuration details + +**When to Use**: +- Querying Prometheus metrics from AMP workspaces +- Analyzing container and Kubernetes metrics +- Running PromQL range queries for trend analysis +- Listing available metrics across workspaces +- Correlating Prometheus metrics with CloudWatch data + +### 6. Codebase Observability Analysis **Primary Use Case**: Automated analysis of application codebases to identify observability gaps @@ -236,7 +262,7 @@ See `cloudtrail-data-source-selection.md` steering file for detailed decision tr - Establishing observability baselines - Training teams on observability patterns -### 6. AWS Documentation Access +### 7. AWS Documentation Access **Primary Use Case**: Quick access to official AWS documentation @@ -280,6 +306,9 @@ Step-by-step Application Signals enablement guide. ### 8. `cloudtrail-data-source-selection.md` CloudTrail data source priority logic (referenced by security-auditing.md). +### 9. `prometheus-metrics.md` +PromQL querying, metric exploration, and AMP workspace management. + ## Quick Start Examples ### Example 1: Investigate High Error Rate @@ -338,11 +367,14 @@ Application Signals APM with service health, SLOs, and distributed tracing. ### awslabs.cloudtrail-mcp-server CloudTrail security auditing and API activity tracking. +### awslabs.prometheus-mcp-server +Amazon Managed Prometheus PromQL queries, metric listing, and workspace management. + ### awslabs.aws-documentation-mcp-server Search and read official AWS documentation. ## License -This power integrates with CloudWatch MCP Server, CloudWatch Application Signals MCP Server, CloudTrail MCP Server, and AWS Documentation MCP Server from [AWS Labs](https://github.com/awslabs/mcp) (Apache-2.0 license). All steering files and power configuration are licensed under Apache-2.0. +This power integrates with CloudWatch MCP Server, CloudWatch Application Signals MCP Server, CloudTrail MCP Server, Prometheus MCP Server, and AWS Documentation MCP Server from [AWS Labs](https://github.com/awslabs/mcp) (Apache-2.0 license). All steering files and power configuration are licensed under Apache-2.0. --- diff --git a/aws-observability/mcp.json b/aws-observability/mcp.json index b92643e..ed30cb4 100644 --- a/aws-observability/mcp.json +++ b/aws-observability/mcp.json @@ -40,6 +40,19 @@ }, "transportType": "stdio" }, + "awslabs.prometheus-mcp-server": { + "command": "uvx", + "args": [ + "awslabs.prometheus-mcp-server@latest" + ], + "env": { + "AWS_PROFILE": "default", + "AWS_REGION": "us-east-1", + "FASTMCP_LOG_LEVEL": "ERROR" + }, + "disabled": false, + "autoApprove": [] + }, "awslabs.aws-documentation-mcp-server": { "command": "uvx", "args": [ diff --git a/aws-observability/steering/prometheus-metrics.md b/aws-observability/steering/prometheus-metrics.md new file mode 100644 index 0000000..6e3560a --- /dev/null +++ b/aws-observability/steering/prometheus-metrics.md @@ -0,0 +1,62 @@ +# Prometheus Metrics Steering File + +## When to Use +Load this steering file when the user asks about: +- Querying Prometheus or AMP metrics +- PromQL queries (instant or range) +- Listing metrics in a Prometheus workspace +- Managing AMP workspaces +- Correlating Prometheus metrics with CloudWatch data + +## Workflow + +### Step 1: Identify the Workspace +If the user hasn't specified a workspace: +1. Use `GetAvailableWorkspaces` to list AMP workspaces +2. Present the available workspaces (ID, alias, status) and ask the user to select one + +### Step 2: Understand the Query Intent +Determine what the user needs: +- **Instant query**: Current value of a metric → use `ExecuteQuery` +- **Range query**: Metric values over time → use `ExecuteRangeQuery` +- **Metric discovery**: What metrics are available → use `ListMetrics` +- **Server info**: Workspace configuration → use `GetServerInfo` + +### Step 3: Execute the Query + +#### Instant Queries +Use `ExecuteQuery` with: +- `workspace_id`: The AMP workspace ID +- `query`: The PromQL expression +- `time` (optional): Evaluation timestamp + +Common PromQL patterns: +- `up` — Check target health +- `rate(http_requests_total[5m])` — Request rate +- `histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))` — P95 latency +- `sum by (namespace) (container_memory_usage_bytes)` — Memory by namespace + +#### Range Queries +Use `ExecuteRangeQuery` with: +- `workspace_id`: The AMP workspace ID +- `query`: The PromQL expression +- `start`: Start time (ISO 8601) +- `end`: End time (ISO 8601) +- `step`: Resolution step (e.g., `1m`, `5m`, `1h`) + +Choose step based on time range: +- Last hour → `1m` step +- Last day → `5m` step +- Last week → `1h` step + +### Step 4: Present Results +- Format metric values clearly with labels +- For range queries, summarize trends (increasing, decreasing, stable) +- Highlight any anomalous values +- Suggest follow-up queries if patterns warrant deeper investigation + +## Cross-Tool Correlation +When investigating issues, combine Prometheus data with other observability tools: +- Use CloudWatch Logs to find error details for metrics showing elevated error rates +- Use Application Signals to correlate Prometheus metrics with distributed traces +- Use CloudTrail to check for infrastructure changes that may explain metric shifts