🎯

prometheus-configuration

🎯Skill

from rmyndharis/antigravity-skills

VibeIndex|
What it does

Configures Prometheus monitoring by setting up metric collection, scraping, recording rules, and alerting for infrastructure and applications.

πŸ“¦

Part of

rmyndharis/antigravity-skills(289 items)

prometheus-configuration

Installation

npm runRun npm script
npm run build:catalog
npxRun with npx
npx @rmyndharis/antigravity-skills search <query>
npxRun with npx
npx @rmyndharis/antigravity-skills search kubernetes
npxRun with npx
npx @rmyndharis/antigravity-skills list
npxRun with npx
npx @rmyndharis/antigravity-skills install <skill-name>

+ 15 more commands

πŸ“– Extracted from docs: rmyndharis/antigravity-skills
10Installs
-
AddedFeb 4, 2026

Skill Details

SKILL.md

Set up Prometheus for comprehensive metric collection, storage, and monitoring of infrastructure and applications. Use when implementing metrics collection, setting up monitoring infrastructure, or configuring alerting systems.

Overview

# Prometheus Configuration

Complete guide to Prometheus setup, metric collection, scrape configuration, and recording rules.

Do not use this skill when

  • The task is unrelated to prometheus configuration
  • You need a different domain or tool outside this scope

Instructions

  • Clarify goals, constraints, and required inputs.
  • Apply relevant best practices and validate outcomes.
  • Provide actionable steps and verification.
  • If detailed examples are required, open resources/implementation-playbook.md.

Purpose

Configure Prometheus for comprehensive metric collection, alerting, and monitoring of infrastructure and applications.

Use this skill when

  • Set up Prometheus monitoring
  • Configure metric scraping
  • Create recording rules
  • Design alert rules
  • Implement service discovery

Prometheus Architecture

```

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚ Applications β”‚ ← Instrumented with client libraries

β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜

β”‚ /metrics endpoint

↓

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚ Prometheus β”‚ ← Scrapes metrics periodically

β”‚ Server β”‚

β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜

β”‚

β”œβ”€β†’ AlertManager (alerts)

β”œβ”€β†’ Grafana (visualization)

└─→ Long-term storage (Thanos/Cortex)

```

Installation

Kubernetes with Helm

```bash

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

helm repo update

helm install prometheus prometheus-community/kube-prometheus-stack \

--namespace monitoring \

--create-namespace \

--set prometheus.prometheusSpec.retention=30d \

--set prometheus.prometheusSpec.storageVolumeSize=50Gi

```

Docker Compose

```yaml

version: '3.8'

services:

prometheus:

image: prom/prometheus:latest

ports:

- "9090:9090"

volumes:

- ./prometheus.yml:/etc/prometheus/prometheus.yml

- prometheus-data:/prometheus

command:

- '--config.file=/etc/prometheus/prometheus.yml'

- '--storage.tsdb.path=/prometheus'

- '--storage.tsdb.retention.time=30d'

volumes:

prometheus-data:

```

Configuration File

prometheus.yml:

```yaml

global:

scrape_interval: 15s

evaluation_interval: 15s

external_labels:

cluster: 'production'

region: 'us-west-2'

# Alertmanager configuration

alerting:

alertmanagers:

- static_configs:

- targets:

- alertmanager:9093

# Load rules files

rule_files:

- /etc/prometheus/rules/*.yml

# Scrape configurations

scrape_configs:

# Prometheus itself

- job_name: 'prometheus'

static_configs:

- targets: ['localhost:9090']

# Node exporters

- job_name: 'node-exporter'

static_configs:

- targets:

- 'node1:9100'

- 'node2:9100'

- 'node3:9100'

relabel_configs:

- source_labels: [__address__]

target_label: instance

regex: '([^:]+)(:[0-9]+)?'

replacement: '${1}'

# Kubernetes pods with annotations

- job_name: 'kubernetes-pods'

kubernetes_sd_configs:

- role: pod

relabel_configs:

- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]

action: keep

regex: true

- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]

action: replace

target_label: __metrics_path__

regex: (.+)

- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]

action: replace

regex: ([^:]+)(?::\d+)?;(\d+)

replacement: $1:$2

target_label: __address__

- source_labels: [__meta_kubernetes_namespace]

action: replace

target_label: namespace

- source_labels: [__meta_kubernetes_pod_name]

action: replace

target_label: pod

# Application metrics

- job_name: 'my-app'

static_configs:

- targets:

- 'app1.example.com:9090'

- 'app2.example.com:9090'

metrics_path: '/metrics'

scheme: 'https'

tls_config:

ca_file: /etc/prometheus/ca.crt

cert_file: /etc/prometheus/client.crt

key_file: /etc/prometheus/client.key

```

Reference: See assets/prometheus.yml.template

Scrape Configurations

Static Targets

```yaml

scrape_configs:

- job_name: 'static-targets'

static_configs:

- targets: ['host1:9100', 'host2:9100']

labels:

env: 'production'

region: 'us-west-2'

```

File-based Service Discovery

```yaml

scrape_configs:

- job_name: 'file-sd'

file_sd_configs:

- files:

- /etc/prometheus/targets/*.json

- /etc/prometheus/targets/*.yml

refresh_interval: 5m

```

targets/production.json:

```json

[

{

"targets": ["app1:9090", "app2:9090"],

"labels": {

"env": "production",

"service": "api"

}

}

]

```

Kubernetes Service Discovery

```yaml

scrape_configs:

- job_name: 'kubernetes-services'

kubernetes_sd_configs:

- role: service

relabel_configs:

- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]

action: keep

regex: true

- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]

action: replace

target_label: __scheme__

regex: (https?)

- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]

action: replace

target_label: __metrics_path__

regex: (.+)

```

Reference: See references/scrape-configs.md

Recording Rules

Create pre-computed metrics for frequently queried expressions:

```yaml

# /etc/prometheus/rules/recording_rules.yml

groups:

- name: api_metrics

interval: 15s

rules:

# HTTP request rate per service

- record: job:http_requests:rate5m

expr: sum by (job) (rate(http_requests_total[5m]))

# Error rate percentage

- record: job:http_requests_errors:rate5m

expr: sum by (job) (rate(http_requests_total{status=~"5.."}[5m]))

- record: job:http_requests_error_rate:percentage

expr: |

(job:http_requests_errors:rate5m / job:http_requests:rate5m) * 100

# P95 latency

- record: job:http_request_duration:p95

expr: |

histogram_quantile(0.95,

sum by (job, le) (rate(http_request_duration_seconds_bucket[5m]))

)

- name: resource_metrics

interval: 30s

rules:

# CPU utilization percentage

- record: instance:node_cpu:utilization

expr: |

100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Memory utilization percentage

- record: instance:node_memory:utilization

expr: |

100 - ((node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100)

# Disk usage percentage

- record: instance:node_disk:utilization

expr: |

100 - ((node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100)

```

Reference: See references/recording-rules.md

Alert Rules

```yaml

# /etc/prometheus/rules/alert_rules.yml

groups:

- name: availability

interval: 30s

rules:

- alert: ServiceDown

expr: up{job="my-app"} == 0

for: 1m

labels:

severity: critical

annotations:

summary: "Service {{ $labels.instance }} is down"

description: "{{ $labels.job }} has been down for more than 1 minute"

- alert: HighErrorRate

expr: job:http_requests_error_rate:percentage > 5

for: 5m

labels:

severity: warning

annotations:

summary: "High error rate for {{ $labels.job }}"

description: "Error rate is {{ $value }}% (threshold: 5%)"

- alert: HighLatency

expr: job:http_request_duration:p95 > 1

for: 5m

labels:

severity: warning

annotations:

summary: "High latency for {{ $labels.job }}"

description: "P95 latency is {{ $value }}s (threshold: 1s)"

- name: resources

interval: 1m

rules:

- alert: HighCPUUsage

expr: instance:node_cpu:utilization > 80

for: 5m

labels:

severity: warning

annotations:

summary: "High CPU usage on {{ $labels.instance }}"

description: "CPU usage is {{ $value }}%"

- alert: HighMemoryUsage

expr: instance:node_memory:utilization > 85

for: 5m

labels:

severity: warning

annotations:

summary: "High memory usage on {{ $labels.instance }}"

description: "Memory usage is {{ $value }}%"

- alert: DiskSpaceLow

expr: instance:node_disk:utilization > 90

for: 5m

labels:

severity: critical

annotations:

summary: "Low disk space on {{ $labels.instance }}"

description: "Disk usage is {{ $value }}%"

```

Validation

```bash

# Validate configuration

promtool check config prometheus.yml

# Validate rules

promtool check rules /etc/prometheus/rules/*.yml

# Test query

promtool query instant http://localhost:9090 'up'

```

Reference: See scripts/validate-prometheus.sh

Best Practices

  1. Use consistent naming for metrics (prefix_name_unit)
  2. Set appropriate scrape intervals (15-60s typical)
  3. Use recording rules for expensive queries
  4. Implement high availability (multiple Prometheus instances)
  5. Configure retention based on storage capacity
  6. Use relabeling for metric cleanup
  7. Monitor Prometheus itself
  8. Implement federation for large deployments
  9. Use Thanos/Cortex for long-term storage
  10. Document custom metrics

Troubleshooting

Check scrape targets:

```bash

curl http://localhost:9090/api/v1/targets

```

Check configuration:

```bash

curl http://localhost:9090/api/v1/status/config

```

Test query:

```bash

curl 'http://localhost:9090/api/v1/query?query=up'

```

Reference Files

  • assets/prometheus.yml.template - Complete configuration template
  • references/scrape-configs.md - Scrape configuration patterns
  • references/recording-rules.md - Recording rule examples
  • scripts/validate-prometheus.sh - Validation script

Related Skills

  • grafana-dashboards - For visualization
  • slo-implementation - For SLO monitoring
  • distributed-tracing - For request tracing