🎯

auto-claude-optimization

🎯Skill

from adaptationio/skrillz

VibeIndex|
What it does

Optimizes Claude AI performance by reducing token usage, managing API costs, and improving build speed through intelligent model and context selection.

auto-claude-optimization

Installation

Install skill:
npx skills add https://github.com/adaptationio/skrillz --skill auto-claude-optimization
3
Last UpdatedJan 16, 2026

Skill Details

SKILL.md

Auto-Claude performance optimization and cost management. Use when optimizing token usage, reducing API costs, improving build speed, or tuning agent performance.

Overview

# Auto-Claude Optimization

Performance tuning, cost reduction, and efficiency improvements.

Performance Overview

Key Metrics

| Metric | Impact | Optimization |

|--------|--------|--------------|

| API latency | Build speed | Model selection, caching |

| Token usage | Cost | Prompt efficiency, context limits |

| Memory queries | Speed | Embedding model, index tuning |

| Build iterations | Time | Spec quality, QA settings |

Model Optimization

Model Selection

| Model | Speed | Cost | Quality | Use Case |

|-------|-------|------|---------|----------|

| claude-opus-4-5-20251101 | Slow | High | Best | Complex features |

| claude-sonnet-4-5-20250929 | Fast | Medium | Good | Standard features |

```bash

# Override model in .env

AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929

```

Extended Thinking Tokens

Configure thinking budget per agent:

| Agent | Default | Recommended |

|-------|---------|-------------|

| Spec creation | 16000 | Keep default for quality |

| Planning | 5000 | Reduce to 3000 for speed |

| Coding | 0 | Keep disabled |

| QA Review | 10000 | Reduce to 5000 for speed |

```python

# In agent configuration

max_thinking_tokens=5000 # or None to disable

```

Token Optimization

Reduce Context Size

  1. Smaller spec files

```bash

# Keep specs concise

# Bad: 5000 word spec

# Good: 500 word spec with clear criteria

```

  1. Limit codebase scanning

```python

# In context/builder.py

MAX_CONTEXT_FILES = 50 # Reduce from 100

```

  1. Use targeted searches

```bash

# Instead of full codebase scan

# Focus on relevant directories

```

Efficient Prompts

Optimize system prompts in apps/backend/prompts/:

```markdown

You are an expert software developer who specializes in building

high-quality, production-ready applications. You have extensive

experience with many programming languages and frameworks...

Expert full-stack developer. Build production-quality code.

Follow existing patterns. Test thoroughly.

```

Memory Optimization

```bash

# Use efficient embedding model

OPENAI_EMBEDDING_MODEL=text-embedding-3-small

# Or offline with smaller model

OLLAMA_EMBEDDING_MODEL=all-minilm

OLLAMA_EMBEDDING_DIM=384

```

Speed Optimization

Parallel Execution

```bash

# Enable more parallel agents (default: 4)

MAX_PARALLEL_AGENTS=8

```

Reduce QA Iterations

```bash

# Limit QA loop iterations

MAX_QA_ITERATIONS=10 # Default: 50

# Skip QA for quick iterations

python run.py --spec 001 --skip-qa

```

Faster Spec Creation

```bash

# Force simple complexity for quick tasks

python spec_runner.py --task "Fix typo" --complexity simple

# Skip research phase

SKIP_RESEARCH_PHASE=true python spec_runner.py --task "..."

```

API Timeout Tuning

```bash

# Reduce timeout for faster failure detection

API_TIMEOUT_MS=120000 # 2 minutes (default: 10 minutes)

```

Cost Management

Monitor Token Usage

```bash

# Enable cost tracking

ENABLE_COST_TRACKING=true

# View usage report

python usage_report.py --spec 001

```

Cost Reduction Strategies

  1. Use cheaper models for simple tasks

```bash

# For simple specs

AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 python spec_runner.py --task "..."

```

  1. Limit context window

```bash

MAX_CONTEXT_TOKENS=50000 # Reduce from 100000

```

  1. Batch similar tasks

```bash

# Create specs together, run together

python spec_runner.py --task "Add feature A"

python spec_runner.py --task "Add feature B"

python run.py --spec 001

python run.py --spec 002

```

  1. Use local models for memory

```bash

# Ollama for memory (free)

GRAPHITI_LLM_PROVIDER=ollama

GRAPHITI_EMBEDDER_PROVIDER=ollama

```

Cost Estimation

| Operation | Estimated Tokens | Cost (Opus) | Cost (Sonnet) |

|-----------|-----------------|-------------|---------------|

| Simple spec | 10k | ~$0.30 | ~$0.06 |

| Standard spec | 50k | ~$1.50 | ~$0.30 |

| Complex spec | 200k | ~$6.00 | ~$1.20 |

| Build (simple) | 50k | ~$1.50 | ~$0.30 |

| Build (standard) | 200k | ~$6.00 | ~$1.20 |

| Build (complex) | 500k | ~$15.00 | ~$3.00 |

Memory System Optimization

Embedding Performance

```bash

# Faster embeddings

OPENAI_EMBEDDING_MODEL=text-embedding-3-small # 1536 dim, fast

# Higher quality (slower)

OPENAI_EMBEDDING_MODEL=text-embedding-3-large # 3072 dim

# Offline (fastest, free)

OLLAMA_EMBEDDING_MODEL=all-minilm

OLLAMA_EMBEDDING_DIM=384

```

Query Optimization

```python

# Limit search results

memory.search("query", limit=10) # Instead of 100

# Use semantic caching

ENABLE_MEMORY_CACHE=true

```

Database Maintenance

```bash

# Compact database periodically

python -c "from integrations.graphiti.memory import compact_database; compact_database()"

# Clear old episodes

python query_memory.py --cleanup --older-than 30d

```

Build Efficiency

Spec Quality = Build Speed

High-quality specs reduce iterations:

```markdown

# Good spec (fewer iterations)

Acceptance Criteria

  • [ ] User can log in with email/password
  • [ ] Invalid credentials show error message
  • [ ] Successful login redirects to /dashboard
  • [ ] Session persists for 24 hours

# Bad spec (more iterations)

Acceptance Criteria

  • [ ] Login works

```

Subtask Granularity

Optimal subtask size:

  • Too large: Agent gets stuck, needs recovery
  • Too small: Overhead per subtask
  • Optimal: 30-60 minutes of work each

Parallel Work

Let agents spawn subagents for parallel execution:

```

Main Coder

β”œβ”€β”€ Subagent 1: Frontend (parallel)

β”œβ”€β”€ Subagent 2: Backend (parallel)

└── Subagent 3: Tests (parallel)

```

Environment Tuning

Optimal .env Configuration

```bash

# Performance-focused configuration

AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929

API_TIMEOUT_MS=180000

MAX_PARALLEL_AGENTS=6

# Memory optimization

GRAPHITI_LLM_PROVIDER=ollama

GRAPHITI_EMBEDDER_PROVIDER=ollama

OLLAMA_LLM_MODEL=llama3.2:3b

OLLAMA_EMBEDDING_MODEL=all-minilm

OLLAMA_EMBEDDING_DIM=384

# Reduce verbosity

DEBUG=false

ENABLE_FANCY_UI=false

```

Resource Limits

```bash

# Limit Python memory

export PYTHONMALLOC=malloc

# Set max file descriptors

ulimit -n 4096

```

Benchmarking

Measure Build Time

```bash

# Time a build

time python run.py --spec 001

# Compare models

time AUTO_BUILD_MODEL=claude-opus-4-5-20251101 python run.py --spec 001

time AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 python run.py --spec 001

```

Profile Memory Usage

```bash

# Monitor memory

watch -n 1 'ps aux | grep python | head -5'

# Profile script

python -m cProfile -o profile.stats run.py --spec 001

python -c "import pstats; p = pstats.Stats('profile.stats'); p.sort_stats('cumulative').print_stats(20)"

```

Quick Wins

Immediate Optimizations

  1. Switch to Sonnet for most tasks

```bash

AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929

```

  1. Use Ollama for memory

```bash

GRAPHITI_LLM_PROVIDER=ollama

GRAPHITI_EMBEDDER_PROVIDER=ollama

```

  1. Skip QA for prototypes

```bash

python run.py --spec 001 --skip-qa

```

  1. Force simple complexity for small tasks

```bash

python spec_runner.py --task "..." --complexity simple

```

Medium-Term Improvements

  1. Optimize prompts in apps/backend/prompts/
  2. Configure project-specific security allowlist
  3. Set up memory caching
  4. Tune parallel agent count

Long-Term Strategies

  1. Self-hosted LLM for memory (Ollama)
  2. Caching layer for common operations
  3. Incremental context building
  4. Project-specific prompt optimization

Related Skills

  • auto-claude-memory: Memory configuration
  • auto-claude-build: Build process
  • auto-claude-troubleshooting: Debugging

More from this repository10

🎯
analysis🎯Skill

Performs comprehensive analysis of code, skills, processes, and data to extract actionable insights, identify patterns, and drive data-driven improvements.

🎯
auto-claude-setup🎯Skill

Automates comprehensive installation and setup of Auto-Claude across Windows, macOS, Linux, and WSL with multi-platform support and dependency management.

🎯
xai-auth🎯Skill

Authenticates and configures xAI Grok API access using Twitter/X account credentials, enabling seamless integration with OpenAI-compatible SDK methods.

🎯
xai-crypto-sentiment🎯Skill

xai-crypto-sentiment skill from adaptationio/skrillz

🎯
twelvedata-api🎯Skill

Retrieves comprehensive financial market data including stocks, forex, crypto, and technical indicators using the Twelve Data API.

🎯
auto-claude-troubleshooting🎯Skill

Automatically diagnoses and resolves Auto-Claude installation, configuration, and runtime issues across different platforms and environments.

🎯
xai-x-search🎯Skill

Enables real-time Twitter/X searches using Grok API to extract insights, track trends, monitor accounts, and analyze social discussions.

🎯
xai-agent-tools🎯Skill

Enables autonomous agents to search X, web, execute code, and analyze documents with server-side tool management.

🎯
xai-financial-integration🎯Skill

Retrieve and integrate xAI Grok sentiment with financial data APIs to generate comprehensive market insights and analysis.

🎯
auto-claude-memory🎯Skill

Enables persistent, cross-session memory with semantic search using a graph-based knowledge system across multiple AI providers.