Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 37 additions & 5 deletions aws-observability/POWER.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
name: "aws-observability"
displayName: "AWS Observability"
description: "Comprehensive AWS observability platform combining CloudWatch Logs, Metrics, Alarms, Application Signals (APM), CloudTrail security auditing, and automated codebase observability gap analysis, for complete monitoring, troubleshooting, and optimization."
keywords: ["cloudwatch", "logs", "metrics", "traces", "alarms", "alerts", "monitoring", "observability", "application signals", "apm", "distributed tracing", "x-ray", "opentelemetry", "otel", "slow", "latency", "performance", "bottleneck", "degradation", "timeout", "high latency", "slow api", "api performance", "service performance", "response time", "p50", "p90", "p95", "p99", "errors", "error rate", "fault rate", "failure rate", "5xx", "4xx", "exceptions", "availability", "uptime", "downtime", "outage", "sev1", "sev2", "slo", "sli", "service level", "error budget", "breach", "troubleshooting", "root cause", "rca", "investigate", "diagnose", "log analysis", "log insights", "log query", "log patterns", "audit", "cloudtrail", "security audit", "access logs", "iam changes", "change events", "service map", "cascading failure", "canary", "synthetic monitoring", "health check", "observability gaps", "missing instrumentation", "monitoring instrumentation", "structured logging", "silent failures", "logging gaps", "alarm investigation", "trace analysis", "span analysis", "request tracing"]
description: "Comprehensive AWS observability platform combining CloudWatch Logs, Metrics, Alarms, Application Signals (APM), CloudTrail security auditing, Amazon Managed Prometheus (AMP) metric querying, and automated codebase observability gap analysis, for complete monitoring, troubleshooting, and optimization."
keywords: ["cloudwatch", "logs", "metrics", "traces", "alarms", "alerts", "monitoring", "observability", "application signals", "apm", "distributed tracing", "x-ray", "opentelemetry", "otel", "slow", "latency", "performance", "bottleneck", "degradation", "timeout", "high latency", "slow api", "api performance", "service performance", "response time", "p50", "p90", "p95", "p99", "errors", "error rate", "fault rate", "failure rate", "5xx", "4xx", "exceptions", "availability", "uptime", "downtime", "outage", "sev1", "sev2", "slo", "sli", "service level", "error budget", "breach", "troubleshooting", "root cause", "rca", "investigate", "diagnose", "log analysis", "log insights", "log query", "log patterns", "audit", "cloudtrail", "security audit", "access logs", "iam changes", "change events", "service map", "cascading failure", "canary", "synthetic monitoring", "health check", "observability gaps", "missing instrumentation", "monitoring instrumentation", "structured logging", "silent failures", "logging gaps", "alarm investigation", "trace analysis", "span analysis", "request tracing", "prometheus", "promql", "amp", "amazon managed prometheus", "prometheus metrics", "prometheus query", "prometheus workspace"]
author: "AWS"
---

Expand Down Expand Up @@ -58,6 +58,13 @@ author: "AWS"
- "audit codebase", "check instrumentation", "observability gaps"
- "missing logs", "improve observability"

### 📈 Prometheus / AMP Metrics → `prometheus-metrics.md`

**Load when user mentions:**
- "prometheus", "promql", "AMP", "managed prometheus"
- "prometheus metrics", "prometheus query", "prometheus workspace"
- "PromQL range query", "list metrics"

### ⚙️ Application Signals Setup → `application-signals-setup.md`

**Load when user mentions:**
Expand Down Expand Up @@ -115,6 +122,7 @@ The comprehensive AWS observability platform combining monitoring, troubleshooti
- **CloudWatch Logs** - Query and analyze logs using CloudWatch Logs Insights
- **Metrics & Alarms** - Metric querying with Metrics Insights and intelligent alarm recommendations
- **Application Signals** - APM with distributed tracing, service maps, SLOs, and enablement guides
- **Amazon Managed Prometheus** - PromQL queries against AMP workspaces with SigV4 authentication
- **Codebase Observability Analysis** - Automated analysis of codebases to identify observability gaps
- **CloudTrail Integration** - Security auditing and compliance tracking
- **AWS Documentation** - Direct access to official AWS docs for troubleshooting
Expand Down Expand Up @@ -214,7 +222,25 @@ See `cloudtrail-data-source-selection.md` steering file for detailed decision tr
- Detecting unauthorized access attempts
- Root cause analysis for configuration changes

### 5. Codebase Observability Analysis
### 5. Amazon Managed Prometheus (AMP)

**Primary Use Case**: Query and analyze Prometheus metrics from AWS Managed Prometheus workspaces

**Key Features**:
- Execute instant and range PromQL queries
- List available metrics in a workspace
- Discover and manage AMP workspaces
- AWS SigV4 authentication for secure access
- Retrieve server configuration details

**When to Use**:
- Querying Prometheus metrics from AMP workspaces
- Analyzing container and Kubernetes metrics
- Running PromQL range queries for trend analysis
- Listing available metrics across workspaces
- Correlating Prometheus metrics with CloudWatch data

### 6. Codebase Observability Analysis

**Primary Use Case**: Automated analysis of application codebases to identify observability gaps

Expand All @@ -236,7 +262,7 @@ See `cloudtrail-data-source-selection.md` steering file for detailed decision tr
- Establishing observability baselines
- Training teams on observability patterns

### 6. AWS Documentation Access
### 7. AWS Documentation Access

**Primary Use Case**: Quick access to official AWS documentation

Expand Down Expand Up @@ -280,6 +306,9 @@ Step-by-step Application Signals enablement guide.
### 8. `cloudtrail-data-source-selection.md`
CloudTrail data source priority logic (referenced by security-auditing.md).

### 9. `prometheus-metrics.md`
PromQL querying, metric exploration, and AMP workspace management.

## Quick Start Examples

### Example 1: Investigate High Error Rate
Expand Down Expand Up @@ -338,11 +367,14 @@ Application Signals APM with service health, SLOs, and distributed tracing.
### awslabs.cloudtrail-mcp-server
CloudTrail security auditing and API activity tracking.

### awslabs.prometheus-mcp-server
Amazon Managed Prometheus PromQL queries, metric listing, and workspace management.

### awslabs.aws-documentation-mcp-server
Search and read official AWS documentation.

## License
This power integrates with CloudWatch MCP Server, CloudWatch Application Signals MCP Server, CloudTrail MCP Server, and AWS Documentation MCP Server from [AWS Labs](https://github.com/awslabs/mcp) (Apache-2.0 license). All steering files and power configuration are licensed under Apache-2.0.
This power integrates with CloudWatch MCP Server, CloudWatch Application Signals MCP Server, CloudTrail MCP Server, Prometheus MCP Server, and AWS Documentation MCP Server from [AWS Labs](https://github.com/awslabs/mcp) (Apache-2.0 license). All steering files and power configuration are licensed under Apache-2.0.

---

Expand Down
13 changes: 13 additions & 0 deletions aws-observability/mcp.json
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,19 @@
},
"transportType": "stdio"
},
"awslabs.prometheus-mcp-server": {
"command": "uvx",
"args": [
"awslabs.prometheus-mcp-server@latest"
],
"env": {
"AWS_PROFILE": "default",
"AWS_REGION": "us-east-1",
"FASTMCP_LOG_LEVEL": "ERROR"
},
"disabled": false,
"autoApprove": []
},
"awslabs.aws-documentation-mcp-server": {
"command": "uvx",
"args": [
Expand Down
62 changes: 62 additions & 0 deletions aws-observability/steering/prometheus-metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Prometheus Metrics Steering File

## When to Use
Load this steering file when the user asks about:
- Querying Prometheus or AMP metrics
- PromQL queries (instant or range)
- Listing metrics in a Prometheus workspace
- Managing AMP workspaces
- Correlating Prometheus metrics with CloudWatch data

## Workflow

### Step 1: Identify the Workspace
If the user hasn't specified a workspace:
1. Use `GetAvailableWorkspaces` to list AMP workspaces
2. Present the available workspaces (ID, alias, status) and ask the user to select one

### Step 2: Understand the Query Intent
Determine what the user needs:
- **Instant query**: Current value of a metric → use `ExecuteQuery`
- **Range query**: Metric values over time → use `ExecuteRangeQuery`
- **Metric discovery**: What metrics are available → use `ListMetrics`
- **Server info**: Workspace configuration → use `GetServerInfo`

### Step 3: Execute the Query

#### Instant Queries
Use `ExecuteQuery` with:
- `workspace_id`: The AMP workspace ID
- `query`: The PromQL expression
- `time` (optional): Evaluation timestamp

Common PromQL patterns:
- `up` — Check target health
- `rate(http_requests_total[5m])` — Request rate
- `histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))` — P95 latency
- `sum by (namespace) (container_memory_usage_bytes)` — Memory by namespace

#### Range Queries
Use `ExecuteRangeQuery` with:
- `workspace_id`: The AMP workspace ID
- `query`: The PromQL expression
- `start`: Start time (ISO 8601)
- `end`: End time (ISO 8601)
- `step`: Resolution step (e.g., `1m`, `5m`, `1h`)

Choose step based on time range:
- Last hour → `1m` step
- Last day → `5m` step
- Last week → `1h` step

### Step 4: Present Results
- Format metric values clearly with labels
- For range queries, summarize trends (increasing, decreasing, stable)
- Highlight any anomalous values
- Suggest follow-up queries if patterns warrant deeper investigation

## Cross-Tool Correlation
When investigating issues, combine Prometheus data with other observability tools:
- Use CloudWatch Logs to find error details for metrics showing elevated error rates
- Use Application Signals to correlate Prometheus metrics with distributed traces
- Use CloudTrail to check for infrastructure changes that may explain metric shifts
Loading