auto-claude-optimization
π―Skillfrom adaptationio/skrillz
Optimizes Claude AI performance by reducing token usage, managing API costs, and improving build speed through intelligent model and context selection.
Installation
npx skills add https://github.com/adaptationio/skrillz --skill auto-claude-optimizationSkill Details
Auto-Claude performance optimization and cost management. Use when optimizing token usage, reducing API costs, improving build speed, or tuning agent performance.
Overview
# Auto-Claude Optimization
Performance tuning, cost reduction, and efficiency improvements.
Performance Overview
Key Metrics
| Metric | Impact | Optimization |
|--------|--------|--------------|
| API latency | Build speed | Model selection, caching |
| Token usage | Cost | Prompt efficiency, context limits |
| Memory queries | Speed | Embedding model, index tuning |
| Build iterations | Time | Spec quality, QA settings |
Model Optimization
Model Selection
| Model | Speed | Cost | Quality | Use Case |
|-------|-------|------|---------|----------|
| claude-opus-4-5-20251101 | Slow | High | Best | Complex features |
| claude-sonnet-4-5-20250929 | Fast | Medium | Good | Standard features |
```bash
# Override model in .env
AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929
```
Extended Thinking Tokens
Configure thinking budget per agent:
| Agent | Default | Recommended |
|-------|---------|-------------|
| Spec creation | 16000 | Keep default for quality |
| Planning | 5000 | Reduce to 3000 for speed |
| Coding | 0 | Keep disabled |
| QA Review | 10000 | Reduce to 5000 for speed |
```python
# In agent configuration
max_thinking_tokens=5000 # or None to disable
```
Token Optimization
Reduce Context Size
- Smaller spec files
```bash
# Keep specs concise
# Bad: 5000 word spec
# Good: 500 word spec with clear criteria
```
- Limit codebase scanning
```python
# In context/builder.py
MAX_CONTEXT_FILES = 50 # Reduce from 100
```
- Use targeted searches
```bash
# Instead of full codebase scan
# Focus on relevant directories
```
Efficient Prompts
Optimize system prompts in apps/backend/prompts/:
```markdown
You are an expert software developer who specializes in building
high-quality, production-ready applications. You have extensive
experience with many programming languages and frameworks...
Expert full-stack developer. Build production-quality code.
Follow existing patterns. Test thoroughly.
```
Memory Optimization
```bash
# Use efficient embedding model
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
# Or offline with smaller model
OLLAMA_EMBEDDING_MODEL=all-minilm
OLLAMA_EMBEDDING_DIM=384
```
Speed Optimization
Parallel Execution
```bash
# Enable more parallel agents (default: 4)
MAX_PARALLEL_AGENTS=8
```
Reduce QA Iterations
```bash
# Limit QA loop iterations
MAX_QA_ITERATIONS=10 # Default: 50
# Skip QA for quick iterations
python run.py --spec 001 --skip-qa
```
Faster Spec Creation
```bash
# Force simple complexity for quick tasks
python spec_runner.py --task "Fix typo" --complexity simple
# Skip research phase
SKIP_RESEARCH_PHASE=true python spec_runner.py --task "..."
```
API Timeout Tuning
```bash
# Reduce timeout for faster failure detection
API_TIMEOUT_MS=120000 # 2 minutes (default: 10 minutes)
```
Cost Management
Monitor Token Usage
```bash
# Enable cost tracking
ENABLE_COST_TRACKING=true
# View usage report
python usage_report.py --spec 001
```
Cost Reduction Strategies
- Use cheaper models for simple tasks
```bash
# For simple specs
AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 python spec_runner.py --task "..."
```
- Limit context window
```bash
MAX_CONTEXT_TOKENS=50000 # Reduce from 100000
```
- Batch similar tasks
```bash
# Create specs together, run together
python spec_runner.py --task "Add feature A"
python spec_runner.py --task "Add feature B"
python run.py --spec 001
python run.py --spec 002
```
- Use local models for memory
```bash
# Ollama for memory (free)
GRAPHITI_LLM_PROVIDER=ollama
GRAPHITI_EMBEDDER_PROVIDER=ollama
```
Cost Estimation
| Operation | Estimated Tokens | Cost (Opus) | Cost (Sonnet) |
|-----------|-----------------|-------------|---------------|
| Simple spec | 10k | ~$0.30 | ~$0.06 |
| Standard spec | 50k | ~$1.50 | ~$0.30 |
| Complex spec | 200k | ~$6.00 | ~$1.20 |
| Build (simple) | 50k | ~$1.50 | ~$0.30 |
| Build (standard) | 200k | ~$6.00 | ~$1.20 |
| Build (complex) | 500k | ~$15.00 | ~$3.00 |
Memory System Optimization
Embedding Performance
```bash
# Faster embeddings
OPENAI_EMBEDDING_MODEL=text-embedding-3-small # 1536 dim, fast
# Higher quality (slower)
OPENAI_EMBEDDING_MODEL=text-embedding-3-large # 3072 dim
# Offline (fastest, free)
OLLAMA_EMBEDDING_MODEL=all-minilm
OLLAMA_EMBEDDING_DIM=384
```
Query Optimization
```python
# Limit search results
memory.search("query", limit=10) # Instead of 100
# Use semantic caching
ENABLE_MEMORY_CACHE=true
```
Database Maintenance
```bash
# Compact database periodically
python -c "from integrations.graphiti.memory import compact_database; compact_database()"
# Clear old episodes
python query_memory.py --cleanup --older-than 30d
```
Build Efficiency
Spec Quality = Build Speed
High-quality specs reduce iterations:
```markdown
# Good spec (fewer iterations)
Acceptance Criteria
- [ ] User can log in with email/password
- [ ] Invalid credentials show error message
- [ ] Successful login redirects to /dashboard
- [ ] Session persists for 24 hours
# Bad spec (more iterations)
Acceptance Criteria
- [ ] Login works
```
Subtask Granularity
Optimal subtask size:
- Too large: Agent gets stuck, needs recovery
- Too small: Overhead per subtask
- Optimal: 30-60 minutes of work each
Parallel Work
Let agents spawn subagents for parallel execution:
```
Main Coder
βββ Subagent 1: Frontend (parallel)
βββ Subagent 2: Backend (parallel)
βββ Subagent 3: Tests (parallel)
```
Environment Tuning
Optimal .env Configuration
```bash
# Performance-focused configuration
AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929
API_TIMEOUT_MS=180000
MAX_PARALLEL_AGENTS=6
# Memory optimization
GRAPHITI_LLM_PROVIDER=ollama
GRAPHITI_EMBEDDER_PROVIDER=ollama
OLLAMA_LLM_MODEL=llama3.2:3b
OLLAMA_EMBEDDING_MODEL=all-minilm
OLLAMA_EMBEDDING_DIM=384
# Reduce verbosity
DEBUG=false
ENABLE_FANCY_UI=false
```
Resource Limits
```bash
# Limit Python memory
export PYTHONMALLOC=malloc
# Set max file descriptors
ulimit -n 4096
```
Benchmarking
Measure Build Time
```bash
# Time a build
time python run.py --spec 001
# Compare models
time AUTO_BUILD_MODEL=claude-opus-4-5-20251101 python run.py --spec 001
time AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 python run.py --spec 001
```
Profile Memory Usage
```bash
# Monitor memory
watch -n 1 'ps aux | grep python | head -5'
# Profile script
python -m cProfile -o profile.stats run.py --spec 001
python -c "import pstats; p = pstats.Stats('profile.stats'); p.sort_stats('cumulative').print_stats(20)"
```
Quick Wins
Immediate Optimizations
- Switch to Sonnet for most tasks
```bash
AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929
```
- Use Ollama for memory
```bash
GRAPHITI_LLM_PROVIDER=ollama
GRAPHITI_EMBEDDER_PROVIDER=ollama
```
- Skip QA for prototypes
```bash
python run.py --spec 001 --skip-qa
```
- Force simple complexity for small tasks
```bash
python spec_runner.py --task "..." --complexity simple
```
Medium-Term Improvements
- Optimize prompts in
apps/backend/prompts/ - Configure project-specific security allowlist
- Set up memory caching
- Tune parallel agent count
Long-Term Strategies
- Self-hosted LLM for memory (Ollama)
- Caching layer for common operations
- Incremental context building
- Project-specific prompt optimization
Related Skills
- auto-claude-memory: Memory configuration
- auto-claude-build: Build process
- auto-claude-troubleshooting: Debugging
More from this repository10
Performs comprehensive analysis of code, skills, processes, and data to extract actionable insights, identify patterns, and drive data-driven improvements.
Automates comprehensive installation and setup of Auto-Claude across Windows, macOS, Linux, and WSL with multi-platform support and dependency management.
Authenticates and configures xAI Grok API access using Twitter/X account credentials, enabling seamless integration with OpenAI-compatible SDK methods.
xai-crypto-sentiment skill from adaptationio/skrillz
Retrieves comprehensive financial market data including stocks, forex, crypto, and technical indicators using the Twelve Data API.
Automatically diagnoses and resolves Auto-Claude installation, configuration, and runtime issues across different platforms and environments.
Enables real-time Twitter/X searches using Grok API to extract insights, track trends, monitor accounts, and analyze social discussions.
Enables autonomous agents to search X, web, execute code, and analyze documents with server-side tool management.
Retrieve and integrate xAI Grok sentiment with financial data APIs to generate comprehensive market insights and analysis.
Enables persistent, cross-session memory with semantic search using a graph-based knowledge system across multiple AI providers.