🎯

slo-implementation

🎯Skill

from rmyndharis/antigravity-skills

VibeIndex|
What it does

Defines and implements Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets to measure and improve service reliability targets.

πŸ“¦

Part of

rmyndharis/antigravity-skills(289 items)

slo-implementation

Installation

npm runRun npm script
npm run build:catalog
npxRun with npx
npx @rmyndharis/antigravity-skills search <query>
npxRun with npx
npx @rmyndharis/antigravity-skills search kubernetes
npxRun with npx
npx @rmyndharis/antigravity-skills list
npxRun with npx
npx @rmyndharis/antigravity-skills install <skill-name>

+ 15 more commands

πŸ“– Extracted from docs: rmyndharis/antigravity-skills
10Installs
-
AddedFeb 4, 2026

Skill Details

SKILL.md

Define and implement Service Level Indicators (SLIs) and Service Level Objectives (SLOs) with error budgets and alerting. Use when establishing reliability targets, implementing SRE practices, or measuring service performance.

Overview

# SLO Implementation

Framework for defining and implementing Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets.

Do not use this skill when

  • The task is unrelated to slo implementation
  • You need a different domain or tool outside this scope

Instructions

  • Clarify goals, constraints, and required inputs.
  • Apply relevant best practices and validate outcomes.
  • Provide actionable steps and verification.
  • If detailed examples are required, open resources/implementation-playbook.md.

Purpose

Implement measurable reliability targets using SLIs, SLOs, and error budgets to balance reliability with innovation velocity.

Use this skill when

  • Define service reliability targets
  • Measure user-perceived reliability
  • Implement error budgets
  • Create SLO-based alerts
  • Track reliability goals

SLI/SLO/SLA Hierarchy

```

SLA (Service Level Agreement)

↓ Contract with customers

SLO (Service Level Objective)

↓ Internal reliability target

SLI (Service Level Indicator)

↓ Actual measurement

```

Defining SLIs

Common SLI Types

#### 1. Availability SLI

```promql

# Successful requests / Total requests

sum(rate(http_requests_total{status!~"5.."}[28d]))

/

sum(rate(http_requests_total[28d]))

```

#### 2. Latency SLI

```promql

# Requests below latency threshold / Total requests

sum(rate(http_request_duration_seconds_bucket{le="0.5"}[28d]))

/

sum(rate(http_request_duration_seconds_count[28d]))

```

#### 3. Durability SLI

```

# Successful writes / Total writes

sum(storage_writes_successful_total)

/

sum(storage_writes_total)

```

Reference: See references/slo-definitions.md

Setting SLO Targets

Availability SLO Examples

| SLO % | Downtime/Month | Downtime/Year |

|-------|----------------|---------------|

| 99% | 7.2 hours | 3.65 days |

| 99.9% | 43.2 minutes | 8.76 hours |

| 99.95%| 21.6 minutes | 4.38 hours |

| 99.99%| 4.32 minutes | 52.56 minutes |

Choose Appropriate SLOs

Consider:

  • User expectations
  • Business requirements
  • Current performance
  • Cost of reliability
  • Competitor benchmarks

Example SLOs:

```yaml

slos:

- name: api_availability

target: 99.9

window: 28d

sli: |

sum(rate(http_requests_total{status!~"5.."}[28d]))

/

sum(rate(http_requests_total[28d]))

- name: api_latency_p95

target: 99

window: 28d

sli: |

sum(rate(http_request_duration_seconds_bucket{le="0.5"}[28d]))

/

sum(rate(http_request_duration_seconds_count[28d]))

```

Error Budget Calculation

Error Budget Formula

```

Error Budget = 1 - SLO Target

```

Example:

  • SLO: 99.9% availability
  • Error Budget: 0.1% = 43.2 minutes/month
  • Current Error: 0.05% = 21.6 minutes/month
  • Remaining Budget: 50%

Error Budget Policy

```yaml

error_budget_policy:

- remaining_budget: 100%

action: Normal development velocity

- remaining_budget: 50%

action: Consider postponing risky changes

- remaining_budget: 10%

action: Freeze non-critical changes

- remaining_budget: 0%

action: Feature freeze, focus on reliability

```

Reference: See references/error-budget.md

SLO Implementation

Prometheus Recording Rules

```yaml

# SLI Recording Rules

groups:

- name: sli_rules

interval: 30s

rules:

# Availability SLI

- record: sli:http_availability:ratio

expr: |

sum(rate(http_requests_total{status!~"5.."}[28d]))

/

sum(rate(http_requests_total[28d]))

# Latency SLI (requests < 500ms)

- record: sli:http_latency:ratio

expr: |

sum(rate(http_request_duration_seconds_bucket{le="0.5"}[28d]))

/

sum(rate(http_request_duration_seconds_count[28d]))

- name: slo_rules

interval: 5m

rules:

# SLO compliance (1 = meeting SLO, 0 = violating)

- record: slo:http_availability:compliance

expr: sli:http_availability:ratio >= bool 0.999

- record: slo:http_latency:compliance

expr: sli:http_latency:ratio >= bool 0.99

# Error budget remaining (percentage)

- record: slo:http_availability:error_budget_remaining

expr: |

(sli:http_availability:ratio - 0.999) / (1 - 0.999) * 100

# Error budget burn rate

- record: slo:http_availability:burn_rate_5m

expr: |

(1 - (

sum(rate(http_requests_total{status!~"5.."}[5m]))

/

sum(rate(http_requests_total[5m]))

)) / (1 - 0.999)

```

SLO Alerting Rules

```yaml

groups:

- name: slo_alerts

interval: 1m

rules:

# Fast burn: 14.4x rate, 1 hour window

# Consumes 2% error budget in 1 hour

- alert: SLOErrorBudgetBurnFast

expr: |

slo:http_availability:burn_rate_1h > 14.4

and

slo:http_availability:burn_rate_5m > 14.4

for: 2m

labels:

severity: critical

annotations:

summary: "Fast error budget burn detected"

description: "Error budget burning at {{ $value }}x rate"

# Slow burn: 6x rate, 6 hour window

# Consumes 5% error budget in 6 hours

- alert: SLOErrorBudgetBurnSlow

expr: |

slo:http_availability:burn_rate_6h > 6

and

slo:http_availability:burn_rate_30m > 6

for: 15m

labels:

severity: warning

annotations:

summary: "Slow error budget burn detected"

description: "Error budget burning at {{ $value }}x rate"

# Error budget exhausted

- alert: SLOErrorBudgetExhausted

expr: slo:http_availability:error_budget_remaining < 0

for: 5m

labels:

severity: critical

annotations:

summary: "SLO error budget exhausted"

description: "Error budget remaining: {{ $value }}%"

```

SLO Dashboard

Grafana Dashboard Structure:

```

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚ SLO Compliance (Current) β”‚

β”‚ βœ“ 99.95% (Target: 99.9%) β”‚

β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€

β”‚ Error Budget Remaining: 65% β”‚

β”‚ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘ 65% β”‚

β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€

β”‚ SLI Trend (28 days) β”‚

β”‚ [Time series graph] β”‚

β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€

β”‚ Burn Rate Analysis β”‚

β”‚ [Burn rate by time window] β”‚

β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

```

Example Queries:

```promql

# Current SLO compliance

sli:http_availability:ratio * 100

# Error budget remaining

slo:http_availability:error_budget_remaining

# Days until error budget exhausted (at current burn rate)

(slo:http_availability:error_budget_remaining / 100)

*

28

/

(1 - sli:http_availability:ratio) * (1 - 0.999)

```

Multi-Window Burn Rate Alerts

```yaml

# Combination of short and long windows reduces false positives

rules:

- alert: SLOBurnRateHigh

expr: |

(

slo:http_availability:burn_rate_1h > 14.4

and

slo:http_availability:burn_rate_5m > 14.4

)

or

(

slo:http_availability:burn_rate_6h > 6

and

slo:http_availability:burn_rate_30m > 6

)

labels:

severity: critical

```

SLO Review Process

Weekly Review

  • Current SLO compliance
  • Error budget status
  • Trend analysis
  • Incident impact

Monthly Review

  • SLO achievement
  • Error budget usage
  • Incident postmortems
  • SLO adjustments

Quarterly Review

  • SLO relevance
  • Target adjustments
  • Process improvements
  • Tooling enhancements

Best Practices

  1. Start with user-facing services
  2. Use multiple SLIs (availability, latency, etc.)
  3. Set achievable SLOs (don't aim for 100%)
  4. Implement multi-window alerts to reduce noise
  5. Track error budget consistently
  6. Review SLOs regularly
  7. Document SLO decisions
  8. Align with business goals
  9. Automate SLO reporting
  10. Use SLOs for prioritization

Reference Files

  • assets/slo-template.md - SLO definition template
  • references/slo-definitions.md - SLO definition patterns
  • references/error-budget.md - Error budget calculations

Related Skills

  • prometheus-configuration - For metric collection
  • grafana-dashboards - For SLO visualization