🎯

chaos-engineering-resilience

🎯Skill

from proffesor-for-testing/agentic-qe

VibeIndex|
What it does

chaos-engineering-resilience skill from proffesor-for-testing/agentic-qe

chaos-engineering-resilience

Installation

Install skill:
npx skills add https://github.com/proffesor-for-testing/agentic-qe --skill chaos-engineering-resilience
4
AddedJan 27, 2026

Skill Details

SKILL.md

"Chaos engineering principles, controlled failure injection, resilience testing, and system recovery validation. Use when testing distributed systems, building confidence in fault tolerance, or validating disaster recovery."

Overview

# Chaos Engineering & Resilience Testing

When testing system resilience or injecting failures:

  1. DEFINE steady state (normal metrics: error rate, latency, throughput)
  2. HYPOTHESIZE system continues in steady state during failure
  3. INJECT real-world failures (network, instance, disk, CPU)
  4. OBSERVE and measure deviation from steady state
  5. FIX weaknesses discovered, document runbooks, repeat

Quick Chaos Steps:

  • Start small: Dev β†’ Staging β†’ 1% prod β†’ gradual rollout
  • Define clear rollback triggers (error_rate > 5%)
  • Measure blast radius, never exceed planned scope
  • Document findings β†’ runbooks β†’ improved resilience

Critical Success Factors:

  • Controlled experiments with automatic rollback
  • Steady state must be measurable
  • Start in non-production, graduate to production

Quick Reference Card

When to Use

  • Distributed systems validation
  • Disaster recovery testing
  • Building confidence in fault tolerance
  • Pre-production resilience verification

Failure Types to Inject

| Category | Failures | Tools |

|----------|----------|-------|

| Network | Latency, packet loss, partition | tc, toxiproxy |

| Infrastructure | Instance kill, disk failure, CPU | Chaos Monkey |

| Application | Exceptions, slow responses, leaks | Gremlin, LitmusChaos |

| Dependencies | Service outage, timeout | WireMock |

Blast Radius Progression

```

Dev (safe) β†’ Staging β†’ 1% prod β†’ 10% β†’ 50% β†’ 100%

↓ ↓ ↓ ↓

Learn Validate Careful Full confidence

```

Steady State Metrics

| Metric | Normal | Alert Threshold |

|--------|--------|-----------------|

| Error rate | < 0.1% | > 1% |

| p99 latency | < 200ms | > 500ms |

| Throughput | baseline | -20% |

---

Chaos Experiment Structure

```typescript

// Chaos experiment definition

const experiment = {

name: 'Database latency injection',

hypothesis: 'System handles 500ms DB latency gracefully',

steadyState: {

errorRate: '< 0.1%',

p99Latency: '< 300ms'

},

method: {

type: 'network-latency',

target: 'database',

delay: '500ms',

duration: '5m'

},

rollback: {

automatic: true,

trigger: 'errorRate > 5%'

}

};

```

---

Agent-Driven Chaos

```typescript

// qe-chaos-engineer runs controlled experiments

await Task("Chaos Experiment", {

target: 'payment-service',

failure: 'terminate-random-instance',

blastRadius: '10%',

duration: '5m',

steadyStateHypothesis: {

metric: 'success-rate',

threshold: 0.99

},

autoRollback: true

}, "qe-chaos-engineer");

// Validates:

// - System recovers automatically

// - Error rate stays within threshold

// - No data loss

// - Alerts triggered appropriately

```

---

Agent Coordination Hints

Memory Namespace

```

aqe/chaos-engineering/

β”œβ”€β”€ experiments/* - Experiment definitions & results

β”œβ”€β”€ steady-states/* - Baseline measurements

β”œβ”€β”€ runbooks/* - Generated recovery procedures

└── blast-radius/* - Impact analysis

```

Fleet Coordination

```typescript

const chaosFleet = await FleetManager.coordinate({

strategy: 'chaos-engineering',

agents: [

'qe-chaos-engineer', // Experiment execution

'qe-performance-tester', // Baseline metrics

'qe-production-intelligence' // Production monitoring

],

topology: 'sequential'

});

```

---

Related Skills

  • [shift-right-testing](../shift-right-testing/) - Production testing
  • [performance-testing](../performance-testing/) - Load testing
  • [test-environment-management](../test-environment-management/) - Environment stability

---

Remember

Break things on purpose to prevent unplanned outages. Find weaknesses before users do. Define steady state, inject failures, measure impact, fix weaknesses, create runbooks. Start small, increase blast radius gradually.

With Agents: qe-chaos-engineer automates chaos experiments with blast radius control, automatic rollback, and comprehensive resilience validation. Generates runbooks from experiment results.

More from this repository10

🎯
n8n-security-testing🎯Skill

Automates security vulnerability scanning and penetration testing for n8n workflows, identifying potential risks and misconfigurations.

🎯
database-testing🎯Skill

Validates database schemas, tests data integrity, verifies migrations, checks transaction isolation, and measures query performance.

🎯
brutal-honesty-review🎯Skill

Delivers unvarnished technical criticism with surgical precision, combining expert-level BS detection and zero-tolerance for low-quality work.

🎯
n8n-expression-testing🎯Skill

n8n-expression-testing skill from proffesor-for-testing/agentic-qe

🎯
n8n-trigger-testing-strategies🎯Skill

Validates n8n workflow triggers by comprehensively testing webhook, schedule, polling, and event-driven mechanisms with robust payload and authentication checks.

🎯
n8n-integration-testing-patterns🎯Skill

Validates n8n integration connectivity, authentication flows, and error handling across external service APIs through comprehensive testing patterns.

🎯
six-thinking-hats🎯Skill

Applies Six Thinking Hats methodology to systematically analyze software testing challenges from multiple perspectives, enhancing decision-making and test strategy development.

🎯
risk-based-testing🎯Skill

Prioritizes testing efforts by systematically assessing and ranking risks based on probability and potential impact across software components.

🎯
shift-left-testing🎯Skill

Accelerates software quality by moving testing earlier in development, reducing defect costs through proactive validation, automated testing, and continuous improvement practices.

🎯
context-driven-testing🎯Skill

context-driven-testing skill from proffesor-for-testing/agentic-qe