🎯

karpenter-autoscaling

🎯Skill

from adaptationio/skrillz

What it does

Automates intelligent Kubernetes node autoscaling on EKS, optimizing instance selection and reducing costs by 20-70% through Spot integration and consolidation.

📦

Part of

adaptationio/skrillz(191 items)

karpenter-autoscaling

Installation

Add MarketplaceAdd marketplace to Claude Code

/plugin marketplace add adaptationio/Skrillz

Install PluginInstall plugin from marketplace

/plugin install skrillz@adaptationio-Skrillz

Claude CodeAdd plugin in Claude Code

/plugin enable skrillz@adaptationio-Skrillz

Add MarketplaceAdd marketplace to Claude Code

/plugin marketplace add /path/to/skrillz

Install PluginInstall plugin from marketplace

/plugin install skrillz@local

+ 4 more commands

📖 Extracted from docs: adaptationio/skrillz

Need more details? View full documentation on GitHub →

1Installs

Last UpdatedJan 16, 2026

View on GitHub Back to Skills

Skill Details

SKILL.md

Karpenter for intelligent Kubernetes node autoscaling on EKS. Use when configuring node provisioning, optimizing costs with Spot instances, replacing Cluster Autoscaler, implementing consolidation, or achieving 20-70% cost savings.

Overview

# Karpenter Autoscaling for Amazon EKS

Intelligent, high-performance node autoscaling for Amazon EKS that provisions nodes in seconds, automatically selects optimal instance types, and reduces costs by 20-70% through Spot integration and consolidation.

Overview

Karpenter is the recommended autoscaler for production EKS workloads (2025), replacing Cluster Autoscaler with:

Speed: Provisions nodes in seconds (vs minutes with Cluster Autoscaler)
Intelligence: Automatically selects optimal instance types based on pod requirements
Flexibility: No need to configure node groups - direct EC2 instance provisioning
Cost Optimization: 20-70% cost reduction through better bin-packing and Spot integration
Consolidation: Automatic node consolidation when underutilized or empty

Real-World Results:

20% overall AWS bill reduction
Up to 90% savings for CI/CD workloads
70% reduction in monthly compute costs
15-30% waste reduction with faster scale-up

When to Use

Replacing Cluster Autoscaler with faster, smarter autoscaling
Optimizing EKS cluster costs (target: 20%+ savings)
Implementing Spot instance strategies (30-70% Spot mix)
Need sub-minute node provisioning (seconds vs minutes)
Workloads with variable resource requirements
Multi-instance-type flexibility without node group management
GPU or specialized instance provisioning
Consolidating underutilized nodes automatically

Prerequisites

EKS cluster running Kubernetes 1.23+
Terraform or Helm for installation
IRSA or EKS Pod Identity enabled
Small node group for Karpenter controller (2-3 nodes)
VPC subnets and security groups tagged for Karpenter discovery

---

Quick Start

1. Install Karpenter (Helm)

```bash

# Add Karpenter Helm repo

helm repo add karpenter https://charts.karpenter.sh

helm repo update

# Install Karpenter v1.0+

helm upgrade --install karpenter karpenter/karpenter \

--namespace kube-system \

--set settings.clusterName=my-cluster \

--set settings.interruptionQueue=my-cluster \

--set controller.resources.requests.cpu=1 \

--set controller.resources.requests.memory=1Gi \

--set controller.resources.limits.cpu=1 \

--set controller.resources.limits.memory=1Gi \

--wait

```

See: [references/installation.md](references/installation.md) for complete setup including IRSA/Pod Identity

2. Create NodePool and EC2NodeClass

NodePool (defines scheduling requirements and limits):

```yaml

apiVersion: karpenter.sh/v1

kind: NodePool

metadata:

spec:

template:

spec:

requirements:

- key: karpenter.sh/capacity-type

operator: In

values: ["spot", "on-demand"]

- key: kubernetes.io/arch

operator: In

values: ["amd64"]

- key: karpenter.k8s.aws/instance-category

operator: In

values: ["c", "m", "r"] # Compute, general, memory-optimized

- key: karpenter.k8s.aws/instance-generation

operator: Gt

values: ["4"] # Gen 5+

nodeClassRef:

group: karpenter.k8s.aws

kind: EC2NodeClass

limits:

cpu: "1000"

memory: "1000Gi"

disruption:

consolidationPolicy: WhenUnderutilized

consolidateAfter: 30s

budgets:

- nodes: "10%"

---

apiVersion: karpenter.k8s.aws/v1

kind: EC2NodeClass

metadata:

spec:

amiFamily: AL2023 # Amazon Linux 2023

role: KarpenterNodeRole-my-cluster

subnetSelectorTerms:

- tags:

karpenter.sh/discovery: my-cluster

securityGroupSelectorTerms:

- tags:

karpenter.sh/discovery: my-cluster

blockDeviceMappings:

- deviceName: /dev/xvda

ebs:

volumeSize: 100Gi

volumeType: gp3

encrypted: true

deleteOnTermination: true

```

```bash

kubectl apply -f nodepool.yaml

```

See: [references/nodepools.md](references/nodepools.md) for advanced NodePool patterns

3. Deploy Workload and Watch Autoscaling

```bash

# Deploy test workload

kubectl create deployment inflate --image=public.ecr.aws/eks-distro/kubernetes/pause:3.7 \

--replicas=0

# Scale up to trigger node provisioning

kubectl scale deployment inflate --replicas=10

# Watch Karpenter provision nodes (seconds!)

kubectl logs -f -n kube-system -l app.kubernetes.io/name=karpenter -c controller

# Verify nodes

kubectl get nodes -l karpenter.sh/nodepool=default

# Scale down to trigger consolidation

kubectl scale deployment inflate --replicas=0

# Watch Karpenter consolidate (30s after scale-down)

kubectl logs -f -n kube-system -l app.kubernetes.io/name=karpenter -c controller

```

4. Monitor and Optimize

```bash

# Check NodePool status

kubectl get nodepools

# View disruption metrics

kubectl describe nodepool default

# Monitor provisioning decisions

kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter | grep -i "launched\|terminated"

# Cost optimization metrics

kubectl top nodes

```

See: [references/optimization.md](references/optimization.md) for cost optimization strategies

---

Core Concepts

Karpenter v1.0 Architecture

Key Resources (v1.0+):

NodePool: Defines node scheduling requirements, limits, and disruption policies
EC2NodeClass: AWS-specific configuration (AMIs, instance types, subnets, security groups)
NodeClaim: Karpenter's representation of a node request (auto-created)

How It Works:

Pod becomes unschedulable
Karpenter evaluates pod requirements (CPU, memory, affinity, taints/tolerations)
Karpenter selects optimal instance type from 600+ options
Karpenter provisions EC2 instance directly (no node groups)
Node joins cluster in 30-60 seconds
Pod scheduled to new node

Consolidation:

Continuously monitors node utilization
Consolidates underutilized nodes (bin-packing)
Drains and deletes empty nodes
Replaces nodes with cheaper alternatives
Respects Pod Disruption Budgets

NodePool vs Cluster Autoscaler Node Groups

| Feature | Karpenter NodePool | Cluster Autoscaler |

|---------|-------------------|-------------------|

| Provisioning Speed | 30-60 seconds | 2-5 minutes |

| Instance Selection | Automatic (600+ types) | Manual (pre-defined) |

| Bin-Packing | Intelligent | Limited |

| Spot Integration | Built-in, intelligent | Requires node groups |

| Consolidation | Automatic | Manual |

| Configuration | Single NodePool | Multiple node groups |

| Cost Savings | 20-70% | 10-20% |

---

Common Workflows

Workflow 1: Install Karpenter with Terraform

Use case: Production-grade installation with infrastructure as code

```hcl

# Karpenter module

module "karpenter" {

source = "terraform-aws-modules/eks/aws//modules/karpenter"

version = "~> 20.0"

cluster_name = module.eks.cluster_name

irsa_oidc_provider_arn = module.eks.oidc_provider_arn

# Enable Pod Identity (2025 recommended)

enable_pod_identity = true

# Additional IAM policies

node_iam_role_additional_policies = {

AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"

}

tags = {

Environment = "production"

}

# Helm release

resource "helm_release" "karpenter" {

namespace = "kube-system"

name = "karpenter"

repository = "oci://public.ecr.aws/karpenter"

chart = "karpenter"

version = "1.0.0"

set {

name = "settings.clusterName"

value = module.eks.cluster_name

}

set {

name = "settings.interruptionQueue"

value = module.karpenter.queue_name

}

set {

name = "serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"

value = module.karpenter.iam_role_arn

}

```

Steps:

Review [references/installation.md](references/installation.md)
Configure Terraform module with cluster details
Apply infrastructure: terraform apply
Verify installation: kubectl get pods -n kube-system -l app.kubernetes.io/name=karpenter
Tag subnets and security groups for discovery
Deploy NodePool and EC2NodeClass

See: [references/installation.md](references/installation.md) for complete Terraform setup

---

Workflow 2: Configure Spot/On-Demand Mix (30/70)

Use case: Optimize costs while maintaining availability (recommended: 30% On-Demand, 70% Spot)

Critical NodePool (On-Demand only):

```yaml

apiVersion: karpenter.sh/v1

kind: NodePool

metadata:

spec:

template:

spec:

requirements:

- key: karpenter.sh/capacity-type

operator: In

values: ["on-demand"]

- key: karpenter.k8s.aws/instance-category

operator: In

values: ["m", "c"]

nodeClassRef:

group: karpenter.k8s.aws

kind: EC2NodeClass

taints:

- key: "critical"

value: "true"

effect: "NoSchedule"

limits:

cpu: "200"

weight: 100 # Higher priority

```

Flexible NodePool (Spot preferred):

```yaml

apiVersion: karpenter.sh/v1

kind: NodePool

metadata:

spec:

template:

spec:

requirements:

- key: karpenter.sh/capacity-type

operator: In

values: ["spot", "on-demand"]

- key: karpenter.k8s.aws/instance-category

operator: In

values: ["c", "m", "r"]

- key: karpenter.k8s.aws/instance-generation

operator: Gt

values: ["4"]

nodeClassRef:

group: karpenter.k8s.aws

kind: EC2NodeClass

limits:

cpu: "800"

disruption:

consolidationPolicy: WhenUnderutilized

budgets:

- nodes: "20%"

weight: 10 # Lower priority (use after critical)

```

Pod tolerations for critical workloads:

```yaml

spec:

tolerations:

- key: "critical"

operator: "Equal"

value: "true"

effect: "NoSchedule"

nodeSelector:

karpenter.sh/capacity-type: on-demand

```

Steps:

Create critical NodePool for databases, stateful apps (On-Demand)
Create flexible NodePool for stateless apps (Spot preferred)
Use taints/tolerations to separate critical workloads
Monitor Spot interruptions: kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter | grep -i interrupt

See: [references/nodepools.md](references/nodepools.md) for Spot strategies

---

Workflow 3: Enable Consolidation for Cost Savings

Use case: Reduce costs by automatically consolidating underutilized nodes

Aggressive consolidation (development/staging):

```yaml

spec:

disruption:

consolidationPolicy: WhenEmptyOrUnderutilized

consolidateAfter: 30s # Consolidate quickly

budgets:

- nodes: "50%" # Allow disrupting 50% of nodes

```

Conservative consolidation (production):

```yaml

spec:

disruption:

consolidationPolicy: WhenUnderutilized

consolidateAfter: 5m # Wait 5 minutes before consolidating

budgets:

- nodes: "10%" # Limit disruption to 10% of nodes at a time

- schedule: "0 9-17 MON-FRI" # Only during business hours

nodes: "20%"

- schedule: "0 0-8,18-23 *" # Off-hours

nodes: "5%"

```

Pod Disruption Budget (protect critical pods):

```yaml

apiVersion: policy/v1

kind: PodDisruptionBudget

metadata:

spec:

minAvailable: 2

selector:

matchLabels:

app: critical-app

```

Steps:

Review [references/optimization.md](references/optimization.md)
Set consolidation policy (WhenEmpty, WhenUnderutilized, WhenEmptyOrUnderutilized)
Configure consolidateAfter delay (30s-5m)
Set disruption budgets (% of nodes)
Create PodDisruptionBudgets for critical apps
Monitor consolidation: kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter | grep consolidat

Expected savings: 15-30% additional reduction beyond Spot savings

See: [references/optimization.md](references/optimization.md) for consolidation best practices

---

Workflow 4: Migrate from Cluster Autoscaler

Use case: Upgrade from Cluster Autoscaler to Karpenter for better performance and cost savings

Migration strategy (zero-downtime):

Install Karpenter (runs alongside Cluster Autoscaler)

```bash

helm install karpenter karpenter/karpenter --namespace kube-system

```

Create NodePool with distinct labels

```yaml

spec:

template:

metadata:

labels:

provisioner: karpenter

```

Migrate workloads gradually

```yaml

# Add node selector to new deployments

spec:

nodeSelector:

provisioner: karpenter

```

Monitor both autoscalers

```bash

# Watch Karpenter

kubectl logs -f -n kube-system -l app.kubernetes.io/name=karpenter

# Watch Cluster Autoscaler

kubectl logs -f -n kube-system -l app=cluster-autoscaler

```

Gradually scale down CA node groups

```bash

# Reduce desired size of CA node groups

aws eks update-nodegroup-config \

--cluster-name my-cluster \

--nodegroup-name ca-nodes \

--scaling-config desiredSize=1,minSize=0,maxSize=3

```

Remove Cluster Autoscaler tags

```bash

# Remove tags from node groups

# k8s.io/cluster-autoscaler/enabled

# k8s.io/cluster-autoscaler/

```

Uninstall Cluster Autoscaler

```bash

helm uninstall cluster-autoscaler -n kube-system

```

Testing checklist:

[ ] Karpenter provisions nodes successfully
[ ] Pods schedule on Karpenter nodes
[ ] Consolidation works as expected
[ ] Spot interruptions handled gracefully
[ ] No unschedulable pods
[ ] Cost metrics show improvement

Rollback plan: Keep CA node groups at min size until confident in Karpenter

---

Workflow 5: GPU Node Provisioning

Use case: Automatically provision GPU instances for ML workloads

GPU NodePool:

```yaml

apiVersion: karpenter.sh/v1

kind: NodePool

metadata:

spec:

template:

spec:

requirements:

- key: karpenter.sh/capacity-type

operator: In

values: ["on-demand"] # GPU typically on-demand

- key: karpenter.k8s.aws/instance-family

operator: In

values: ["g4dn", "g5", "p3", "p4d"]

- key: karpenter.k8s.aws/instance-gpu-count

operator: Gt

values: ["0"]

nodeClassRef:

group: karpenter.k8s.aws

kind: EC2NodeClass

taints:

- key: "nvidia.com/gpu"

value: "true"

effect: "NoSchedule"

limits:

cpu: "1000"

nvidia.com/gpu: "8"

---

apiVersion: karpenter.k8s.aws/v1

kind: EC2NodeClass

metadata:

spec:

amiFamily: AL2 # AL2 with GPU drivers

amiSelectorTerms:

- alias: al2@latest # Latest GPU-enabled AMI

role: KarpenterNodeRole-my-cluster

subnetSelectorTerms:

- tags:

karpenter.sh/discovery: my-cluster

securityGroupSelectorTerms:

- tags:

karpenter.sh/discovery: my-cluster

userData: |

#!/bin/bash

# Install NVIDIA device plugin

/etc/eks/bootstrap.sh my-cluster

```

GPU workload:

```yaml

apiVersion: v1

kind: Pod

metadata:

spec:

tolerations:

- key: "nvidia.com/gpu"

operator: "Exists"

effect: "NoSchedule"

containers:

- name: cuda-container

image: nvidia/cuda:11.8.0-base-ubuntu22.04

command: ["nvidia-smi"]

resources:

limits:

nvidia.com/gpu: 1

```

See: [references/nodepools.md](references/nodepools.md) for GPU configuration details

---

Key Configuration

NodePool Resource Limits

Prevent runaway scaling:

```yaml

spec:

limits:

cpu: "1000" # Max 1000 CPUs across all nodes in pool

memory: "1000Gi" # Max 1000Gi memory

nvidia.com/gpu: "8" # Max 8 GPUs

```

Disruption Controls

Balance cost savings with stability:

```yaml

spec:

disruption:

# When to consolidate

consolidationPolicy: WhenUnderutilized | WhenEmpty | WhenEmptyOrUnderutilized

# Delay before consolidating (prevent flapping)

consolidateAfter: 30s # Default: 30s

# Node expiration (security patching)

expireAfter: 720h # 30 days

# Disruption budgets (rate limiting)

budgets:

- nodes: "10%" # Max 10% of nodes disrupted at once

reasons:

- Underutilized

- Empty

- schedule: "0 0-8 *" # Off-hours: more aggressive

nodes: "50%"

```

Instance Type Flexibility

Maximize Spot availability and cost savings:

```yaml

spec:

template:

spec:

requirements:

# Architecture

- key: kubernetes.io/arch

operator: In

values: ["amd64", "arm64"] # Include ARM for savings

# Instance categories (c=compute, m=general, r=memory)

- key: karpenter.k8s.aws/instance-category

operator: In

values: ["c", "m", "r"]

# Instance generation (5+ for best performance/cost)

- key: karpenter.k8s.aws/instance-generation

operator: Gt

values: ["4"]

# Instance size (exclude large sizes if not needed)

- key: karpenter.k8s.aws/instance-size

operator: NotIn

values: ["metal", "32xlarge", "24xlarge"]

# Capacity type

- key: karpenter.sh/capacity-type

operator: In

values: ["spot", "on-demand"]

```

Result: Karpenter selects from 600+ instance types, maximizing Spot availability

---

Monitoring and Troubleshooting

Key Metrics

```bash

# NodePool status

kubectl get nodepools

# NodeClaim status (pending provisions)

kubectl get nodeclaims

# Node events

kubectl get events --field-selector involvedObject.kind=Node

# Karpenter controller logs

kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter -c controller --tail=100

# Filter for provisioning decisions

kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter | grep "launched instance"

# Filter for consolidation events

kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter | grep "consolidating"

# Spot interruption warnings

kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter | grep "interrupt"

```

Common Issues

1. Nodes not provisioning:

```bash

# Check NodePool status

kubectl describe nodepool default

# Check for unschedulable pods

kubectl get pods -A --field-selector=status.phase=Pending

# Review Karpenter logs for errors

kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter | grep -i error

```

Common causes:

Insufficient IAM permissions
Subnet/security group tags missing
Resource limits exceeded
No instance types match requirements

2. Excessive consolidation (pod restarts):

```yaml

# Increase consolidateAfter delay

spec:

disruption:

consolidateAfter: 5m # Increase from 30s

```

3. Spot interruptions causing issues:

```yaml

# Reduce Spot ratio

key: karpenter.sh/capacity-type

operator: In

values: ["on-demand"] # Use more on-demand

```

---

Best Practices

Cost Optimization

✅ Use 30% On-Demand, 70% Spot for optimal cost/stability balance
✅ Enable consolidation (WhenUnderutilized)
✅ Include ARM instances (Graviton) for 20% additional savings
✅ Set instance generation > 4 for best price/performance
✅ Use multiple instance families (c, m, r) for Spot diversity

Reliability

✅ Set Pod Disruption Budgets for critical applications
✅ Use multiple availability zones
✅ Configure disruption budgets (10-20% for production)
✅ Test Spot interruption handling
✅ Use On-Demand for stateful workloads (databases)

Security

✅ Use IRSA or Pod Identity (not node IAM roles)
✅ Enable EBS encryption in EC2NodeClass
✅ Set expireAfter for regular node rotation (720h/30 days)
✅ Use Amazon Linux 2023 (AL2023) AMIs
✅ Tag resources for cost allocation

Performance

✅ Use dedicated NodePool for Karpenter controller (On-Demand, no consolidation)
✅ Set appropriate resource limits to prevent runaway scaling
✅ Monitor provisioning latency (should be <60s)
✅ Use topology spread constraints for pod distribution

---

Reference Documentation

Detailed Guides (load on-demand):

[references/installation.md](references/installation.md) - Complete installation with Helm, Terraform, IRSA, Pod Identity
[references/nodepools.md](references/nodepools.md) - NodePool and EC2NodeClass configuration patterns
[references/optimization.md](references/optimization.md) - Cost optimization, consolidation, disruption budgets

Official Resources:

[Karpenter Documentation](https://karpenter.sh/)
[AWS Karpenter Best Practices](https://aws.github.io/aws-eks-best-practices/karpenter/)
[Karpenter GitHub](https://github.com/aws/karpenter)

Community Examples:

[Terraform EKS Karpenter Module](https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest/submodules/karpenter)
[Karpenter Blueprints](https://github.com/aws-samples/karpenter-blueprints)

---

Quick Reference

Installation

```bash

helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \

--version 1.0.0 \

--namespace kube-system \

--set settings.clusterName=my-cluster

```

Basic NodePool

```yaml

apiVersion: karpenter.sh/v1

kind: NodePool

metadata:

spec:

template:

spec:

requirements:

- key: karpenter.sh/capacity-type

operator: In

values: ["spot", "on-demand"]

nodeClassRef:

kind: EC2NodeClass

limits:

cpu: "1000"

disruption:

consolidationPolicy: WhenUnderutilized

```

Monitor

```bash

kubectl logs -f -n kube-system -l app.kubernetes.io/name=karpenter

```

Cost Savings Formula

Spot instances: 70-80% savings vs On-Demand
Consolidation: Additional 15-30% reduction
Better bin-packing: 10-20% waste reduction
Total: 20-70% overall cost reduction

---

Next Steps: Install Karpenter using [references/installation.md](references/installation.md), then configure NodePools with [references/nodepools.md](references/nodepools.md)

More from this repository10

🎯

xai-stock-sentiment🎯Skill

xai-stock-sentiment skill from adaptationio/skrillz

🎯

ralph-prompt-single-task🎯Skill

Generates Ralph-compatible prompts for single implementation tasks with clear completion criteria and automatic verification.

🎯

autonomous-loop🎯Skill

Orchestrates continuous autonomous coding sessions, managing feature implementation, testing, and progress tracking with intelligent checkpointing and recovery mechanisms.

🎯

auto-claude-troubleshooting🎯Skill

auto-claude-troubleshooting skill from adaptationio/skrillz

🎯

auto-claude-setup🎯Skill

auto-claude-setup skill from adaptationio/skrillz

🎯

observability-analyzer🎯Skill

Analyzes Claude Code observability data to generate insights on performance, costs, errors, tool usage, sessions, conversations, and subagents through advanced metrics and log querying.

🎯

xai-financial-integration🎯Skill

xai-financial-integration skill from adaptationio/skrillz

🎯

xai-agent-tools🎯Skill

xai-agent-tools skill from adaptationio/skrillz

🎯

xai-crypto-sentiment🎯Skill

xai-crypto-sentiment skill from adaptationio/skrillz

🎯

auto-claude-memory🎯Skill

auto-claude-memory skill from adaptationio/skrillz