end-to-end-orchestrator
π―Skillfrom adaptationio/skrillz
end-to-end-orchestrator skill from adaptationio/skrillz
Part of
adaptationio/skrillz(191 items)
Installation
/plugin marketplace add adaptationio/Skrillz/plugin install skrillz@adaptationio-Skrillz/plugin enable skrillz@adaptationio-Skrillz/plugin marketplace add /path/to/skrillz/plugin install skrillz@local+ 4 more commands
Skill Details
Complete development workflow orchestrator coordinating all multi-ai skills (research β planning β implementation β testing β verification) with quality gates, failure recovery, and state management. Single-command complete workflows from objective to production-ready code. Use when implementing complete features requiring full pipeline, coordinating multiple skills automatically, or executing production-grade development cycles end-to-end.
Overview
# End-to-End Orchestrator
Overview
end-to-end-orchestrator provides single-command complete development workflows, coordinating all 5 multi-ai skills from research through production deployment.
Purpose: Transform "I want feature X" into production-ready code through automated skill coordination
Pattern: Workflow-based (5-stage pipeline with quality gates)
Key Innovation: Automatic orchestration of research β planning β implementation β testing β verification with failure recovery and quality gates
The Complete Pipeline:
```
Input: Feature description
β
- Research (multi-ai-research) [optional]
β [Quality Gate: Research complete]
- Planning (multi-ai-planning)
β [Quality Gate: Plan β₯90/100]
- Implementation (multi-ai-implementation)
β [Quality Gate: Tests pass, coverage β₯80%]
- Testing (multi-ai-testing)
β [Quality Gate: Coverage β₯95%, verified]
- Verification (multi-ai-verification)
β [Quality Gate: Score β₯90/100, all layers pass]
Output: Production-ready code
```
---
When to Use
Use end-to-end-orchestrator when:
- Implementing complete features (not quick fixes)
- Want automated workflow (not manual skill chaining)
- Production-quality required (all gates must pass)
- Time optimization important (parallel where possible)
- Need failure recovery (automatic retry/rollback)
When NOT to Use:
- Quick fixes (<30 minutes)
- Exploratory work (uncertain requirements)
- Manual control preferred (step through each phase)
---
Prerequisites
Required
- All 5 multi-ai skills installed:
- multi-ai-research
- multi-ai-planning
- multi-ai-implementation
- multi-ai-testing
- multi-ai-verification
Optional
- agent-memory-system (for learning from past work)
- hooks-manager (for automation)
- Gemini CLI, Codex CLI (for tri-AI research)
---
Complete Workflow
Stage 1: Research (Optional)
Purpose: Ground implementation in proven patterns
Process:
- Determine if Research Needed:
```typescript
// Check if objective is familiar
const similarWork = await recallMemory({
type: 'episodic',
query: objective
});
if (similarWork.length === 0) {
// Unfamiliar domain β research needed
needsResearch = true;
} else {
// Familiar β can skip research, use past learnings
needsResearch = false;
}
```
- Execute Research (if needed):
```
Use multi-ai-research for "[domain] implementation patterns and best practices"
```
What It Provides:
- Claude research: Official docs, codebase patterns
- Gemini research: Web best practices, latest trends
- Codex research: GitHub patterns, code examples
- Quality: β₯95/100 with 100% citations
- Quality Gate: Research Complete:
```markdown
β Research findings documented
β Patterns identified (minimum 2)
β Best practices extracted (minimum 3)
β Quality score β₯95/100
```
If Fail: Research incomplete β retry research OR proceed without (user decides)
Outputs:
- Research findings (.analysis/ANALYSIS_FINAL.md)
- Patterns and best practices
- Implementation recommendations
Time: 30-60 minutes (can skip if familiar domain)
Next: Proceed to Stage 2
---
Stage 2: Planning
Purpose: Create agent-executable plan with quality β₯90/100
Process:
- Load Research Context (if research done):
```typescript
let context = "";
if (researchDone) {
context = await readFile('.analysis/ANALYSIS_FINAL.md');
}
```
- Invoke Planning:
```
Use multi-ai-planning to create plan for [objective]
${context ? Research findings available in: .analysis/ANALYSIS_FINAL.md : ''}
Create comprehensive plan following 6-step workflow.
```
What It Does:
- Analyzes objective
- Hierarchical decomposition (8-15 tasks)
- Maps dependencies, identifies parallel
- Plans verification for all tasks
- Scores quality (0-100)
- Quality Gate: Plan Approved:
```markdown
β Plan created
β Quality score β₯90/100
β All tasks have verification
β Dependencies mapped
β No circular dependencies
```
If Fail (score <90):
- Review gap analysis
- Apply recommended fixes
- Re-verify
- Retry up to 2 times
- If still <90: Escalate to human review
- Save Plan to Shared State:
```bash
# Save for next stage
cp plans/[plan-id]/plan.json .multi-ai-context/plan.json
```
Outputs:
- plan.json (machine-readable)
- PLAN.md (human-readable)
- COORDINATION.md (execution guide)
- Quality β₯90/100
Time: 1.5-3 hours
Next: Proceed to Stage 3
---
Stage 3: Implementation
Purpose: Execute plan with TDD, produce working code
Process:
- Load Plan:
```typescript
const plan = JSON.parse(readFile('.multi-ai-context/plan.json'));
console.log(π Loaded plan: ${plan.objective});
console.log( Tasks: ${plan.tasks.length});
console.log( Estimated: ${plan.metadata.estimated_total_hours} hours);
```
- Invoke Implementation:
```
Use multi-ai-implementation following plan in .multi-ai-context/plan.json
Execute all 6 steps:
1. Explore & gather context
2. Plan architecture (plan already created, refine as needed)
3. Implement incrementally with TDD
4. Coordinate multi-agent (if parallel tasks)
5. Integration & E2E testing
6. Quality verification before commit
Success criteria from plan.
```
What It Does:
- Explores codebase (progressive disclosure)
- Implements incrementally (<200 lines per commit)
- Test-driven development (tests first)
- Multi-agent coordination for parallel tasks
- Continuous testing during implementation
- Doom loop prevention (max 3 retries)
- Quality Gate: Implementation Complete:
```markdown
β All plan tasks implemented
β All tests passing
β Coverage β₯80% (gate), ideally β₯95%
β No regressions
β Doom loop avoided (< max retries)
```
If Fail:
- Identify failing task
- Retry with different approach
- If 3 failures: Escalate to human
- Save state for recovery
- Save Implementation State:
```bash
# Save for next stage
echo '{
"status": "implemented",
"files_changed": [...],
"tests_run": 95,
"tests_passed": 95,
"coverage": 87,
"commits": ["abc123", "def456"]
}' > .multi-ai-context/implementation-status.json
```
Outputs:
- Working code
- Tests passing
- Coverage β₯80%
- Commits created
Time: 3-10 hours (varies by complexity)
Next: Proceed to Stage 4
---
Stage 4: Testing (Independent Verification)
Purpose: Verify tests are comprehensive and prevent gaming
Process:
- Load Implementation Context:
```typescript
const implStatus = JSON.parse(
readFile('.multi-ai-context/implementation-status.json')
);
console.log(π§ͺ Testing implementation:);
console.log( Files changed: ${implStatus.files_changed.length});
console.log( Current coverage: ${implStatus.coverage}%);
```
- Invoke Independent Testing:
```
Use multi-ai-testing independent verification workflow
Verify:
- Tests in: tests/
- Code in: src/
- Specifications in: .multi-ai-context/plan.json
Workflows to execute:
1. Test quality verification (independent agent)
2. Coverage validation (β₯95% target)
3. Edge case discovery (AI-powered)
4. Multi-agent ensemble scoring (if critical feature)
Score test quality (0-100).
```
What It Does:
- Independent verification (separate agent from impl)
- Checks tests match specifications (not just what code does)
- Generates additional edge case tests
- Multi-agent ensemble for quality scoring
- Prevents overfitting
- Quality Gate: Testing Verified:
```markdown
β Test quality score β₯90/100
β Coverage β₯95% (target achieved)
β Independent verification passed
β No test gaming detected
β Edge cases covered
```
If Fail:
- Review test quality issues
- Generate additional tests
- Re-verify
- Max 2 retries, then escalate
- Save Testing State:
```bash
echo '{
"status": "tested",
"test_quality_score": 92,
"coverage": 96,
"tests_total": 112,
"edge_cases": 23,
"gaming_detected": false
}' > .multi-ai-context/testing-status.json
```
Outputs:
- Test quality β₯90/100
- Coverage β₯95%
- Independent verification passed
Time: 1-3 hours
Next: Proceed to Stage 5
---
Stage 5: Verification (Multi-Layer QA)
Purpose: Final quality assurance before production
Process:
- Load All Context:
```typescript
const plan = JSON.parse(readFile('.multi-ai-context/plan.json'));
const implStatus = JSON.parse(readFile('.multi-ai-context/implementation-status.json'));
const testStatus = JSON.parse(readFile('.multi-ai-context/testing-status.json'));
console.log(π Final verification:);
console.log( Objective: ${plan.objective});
console.log( Implementation: ${implStatus.status});
console.log( Testing: ${testStatus.coverage}% coverage);
```
- Invoke Multi-Layer Verification:
```
Use multi-ai-verification for complete quality check
Verify:
- Code: src/
- Tests: tests/
- Plan: .multi-ai-context/plan.json
Execute all 5 layers:
1. Rules-based (linting, types, schema, SAST)
2. Functional (tests, coverage, examples)
3. Visual (if UI: screenshots, a11y)
4. Integration (E2E, API compatibility)
5. Quality scoring (LLM-as-judge, 0-100)
All 5 quality gates must pass.
```
What It Does:
- Runs all 5 verification layers
- Each layer is independent
- LLM-as-judge for holistic assessment
- Agent-as-a-Judge can execute tools to verify claims
- Multi-agent ensemble for critical features
- Quality Gate: Production Ready:
```markdown
β Layer 1 (Rules): PASS
β Layer 2 (Functional): PASS, coverage 96%
β Layer 3 (Visual): PASS or SKIPPED
β Layer 4 (Integration): PASS
β Layer 5 (Quality): 92/100 β₯90 β
ALL GATES PASSED β PRODUCTION APPROVED
```
If Fail:
- Review gap analysis from failed layer
- Apply recommended fixes
- Re-verify from failed layer (not all 5)
- Max 2 retries per layer
- If still failing: Escalate to human
- Generate Final Report:
```markdown
# Feature Implementation Complete
Objective: [from plan]
## Pipeline Execution Summary
### Stage 1: Research
- Status: β Complete
- Quality: 97/100
- Time: 52 minutes
### Stage 2: Planning
- Status: β Complete
- Quality: 94/100
- Tasks: 23
- Time: 1.8 hours
### Stage 3: Implementation
- Status: β Complete
- Files changed: 15
- Lines added: 847
- Commits: 12
- Time: 6.2 hours
### Stage 4: Testing
- Status: β Complete
- Test quality: 92/100
- Coverage: 96%
- Tests: 112
- Time: 1.5 hours
### Stage 5: Verification
- Status: β Complete
- Quality score: 92/100
- All layers: PASS
- Time: 1.2 hours
## Final Metrics
- Total Time: 11.3 hours
- Quality: 92/100
- Coverage: 96%
- Status: β PRODUCTION READY
## Commits
- abc123: feat: Add database schema
- def456: feat: Implement OAuth integration
- [... 10 more ...]
## Next Steps
- Create PR for team review
- Deploy to staging
- Production release
```
- Save to Memory (if agent-memory-system available):
```typescript
await storeMemory({
type: 'episodic',
event: {
description: Complete implementation: ${objective},
outcomes: {
total_time: 11.3,
quality_score: 92,
test_coverage: 96,
stages_completed: 5
},
learnings: extractedDuringPipeline
}
});
```
Outputs:
- Production-ready code
- Comprehensive final report
- Commits created
- PR ready (if requested)
- Memory saved for future learning
Time: 30-90 minutes
Result: β PRODUCTION READY
---
Failure Recovery
Failure Handling at Each Stage
Stage Fails β Recovery Strategy:
Research Fails:
- Retry with different sources
- Skip research (use memory if available)
- Escalate to human if critical gap
Planning Fails (score <90):
- Review gap analysis
- Apply fixes automatically if possible
- Retry planning (max 2 attempts)
- Escalate if still <90
Implementation Fails:
- Identify failing task
- Automatic rollback to last checkpoint
- Retry with alternative approach
- Doom loop prevention (max 3 retries)
- Escalate with full error context
Testing Fails (coverage <80% or quality <90):
- Generate additional tests for gaps
- Retry verification
- Max 2 retries
- Escalate with coverage report
Verification Fails (score <90 or layer fails):
- Apply auto-fixes for Layer 1-2 issues
- Manual fixes needed for Layer 3-5
- Re-verify from failed layer (not all 5)
- Max 2 retries per layer
- Escalate with quality report
---
Escalation Protocol
When to Escalate to Human:
- Any stage fails 3 times (doom loop)
- Planning quality <80 after 2 retries
- Implementation doom loop detected
- Verification score <80 after 2 retries
- Budget exceeded (if cost tracking enabled)
- Circular dependency detected
- Irrecoverable error (file system, permissions)
Escalation Format:
```markdown
# β οΈ ESCALATION REQUIRED
Stage: Implementation (Stage 3)
Failure: Doom loop detected (3 failed attempts)
Context
- Objective: Implement user authentication
- Failing Task: 2.2.2 Token generation
- Error: Tests fail with "undefined userId" repeatedly
Attempts Made
- Attempt 1: Added userId to payload β Same error
- Attempt 2: Changed payload structure β Same error
- Attempt 3: Different JWT library β Same error
Root Cause Analysis
- Tests expect
user.idbut implementation usesuser.userId - Mismatch in data model between test and implementation
- Auto-fix failed 3 times
Recommended Actions
- Review test specifications vs. implementation
- Align data model (user.id vs. user.userId)
- Manual intervention required
State Saved
- Checkpoint: checkpoint-003 (before attempts)
- Rollback available:
git checkout checkpoint-003 - Continue after fix: Resume from Task 2.2.2
```
---
Parallel Execution Optimization
Identifying Parallel Opportunities
From Plan:
```typescript
const plan = readFile('.multi-ai-context/plan.json');
// Plan identifies parallel groups
const parallelGroups = plan.parallel_groups;
// Example:
// Group 1: Tasks 2.1, 2.2, 2.3 (independent)
// Can execute in parallel
```
Executing Parallel Tasks
Pattern:
```typescript
// Stage 3: Implementation with parallel tasks
const parallelGroup = plan.parallel_groups.find(g => g.group_id === 'pg2');
// Spawn parallel implementation agents
const results = await Promise.all(
parallelGroup.tasks.map(taskId => {
const task = plan.tasks.find(t => t.id === taskId);
return task({
description: Implement ${task.description},
prompt: `Implement task ${task.id}: ${task.description}
Specifications from plan:
${JSON.stringify(task, null, 2)}
Success criteria:
${task.verification.success_criteria.join('\n')}
Write implementation and tests.
Report completion status.`
});
})
);
// Verify all parallel tasks completed
const allSucceeded = results.every(r => r.status === 'complete');
if (allSucceeded) {
// Proceed to integration
} else {
// Handle failures
}
```
Time Savings: 20-40% faster than sequential execution
---
State Management
Cross-Skill State Sharing
Shared Context Directory: .multi-ai-context/
Standard Files:
```
.multi-ai-context/
βββ research-findings.json # From multi-ai-research
βββ plan.json # From multi-ai-planning
βββ implementation-status.json # From multi-ai-implementation
βββ testing-status.json # From multi-ai-testing
βββ verification-report.json # From multi-ai-verification
βββ pipeline-state.json # Orchestrator state
βββ failure-history.json # For doom loop detection
```
Benefits:
- Skills don't duplicate work
- Later stages read earlier outputs
- Failure recovery knows full state
- Memory can be saved from shared state
---
Progress Tracking
Real-Time Progress:
```json
{
"pipeline_id": "pipeline_20250126_1200",
"objective": "Implement user authentication",
"started_at": "2025-01-26T12:00:00Z",
"current_stage": 3,
"stages": [
{
"stage": 1,
"name": "Research",
"status": "complete",
"duration_minutes": 52,
"quality": 97
},
{
"stage": 2,
"name": "Planning",
"status": "complete",
"duration_minutes": 108,
"quality": 94
},
{
"stage": 3,
"name": "Implementation",
"status": "in_progress",
"started_at": "2025-01-26T13:48:00Z",
"tasks_total": 23,
"tasks_complete": 15,
"tasks_remaining": 8,
"percent_complete": 65
},
{
"stage": 4,
"name": "Testing",
"status": "pending"
},
{
"stage": 5,
"name": "Verification",
"status": "pending"
}
],
"estimated_completion": "2025-01-26T20:00:00Z",
"quality_target": 90,
"current_quality_estimate": 92
}
```
Query Progress:
```bash
# Check current status
cat .multi-ai-context/pipeline-state.json | jq '.current_stage, .stages[2].percent_complete'
# Output: Stage 3, 65% complete
```
---
Workflow Modes
Standard Mode (Full Pipeline)
All 5 Stages:
```
Research β Planning β Implementation β Testing β Verification
```
Time: 8-20 hours
Quality: Maximum (all gates, β₯90)
Use For: Production features, complex implementations
---
Fast Mode (Skip Research)
4 Stages (familiar domains):
```
Planning β Implementation β Testing β Verification
```
Time: 6-15 hours
Quality: High (all gates except research)
Use For: Familiar domains, time-sensitive features
---
Quick Mode (Essential Gates Only)
Implementation + Basic Verification:
```
Planning β Implementation β Testing (basic) β Verification (Layers 1-2 only)
```
Time: 3-8 hours
Quality: Good (essential gates only)
Use For: Internal tools, prototypes
---
Best Practices
1. Always Run Planning Stage
Even for "simple" features - planning quality β₯90 prevents issues
2. Use Memory to Skip Research
If similar work done before, recall patterns instead of researching
3. Monitor Progress
Check .multi-ai-context/pipeline-state.json to track progress
4. Trust the Quality Gates
If gate fails, there's a real issue - don't skip fixes
5. Save State Frequently
Each stage completion saves state (enables recovery)
6. Review Final Report
Complete understanding of what was built and quality achieved
---
Integration Points
With All 5 Multi-AI Skills
Coordinates:
- multi-ai-research (Stage 1)
- multi-ai-planning (Stage 2)
- multi-ai-implementation (Stage 3)
- multi-ai-testing (Stage 4)
- multi-ai-verification (Stage 5)
Provides:
- Automatic skill invocation
- Quality gate enforcement
- Failure recovery
- State management
- Progress tracking
- Final reporting
---
With agent-memory-system
Before Pipeline:
- Recall similar past work
- Load learned patterns
- Skip research if memory sufficient
After Pipeline:
- Save complete episode to memory
- Extract learnings
- Update procedural patterns
- Improve estimation accuracy
---
With hooks-manager
Session Hooks:
- SessionStart: Load pipeline state
- SessionEnd: Save pipeline progress
- PostToolUse: Track stage completions
Notification Hooks:
- Send telemetry on stage completions
- Alert on gate failures
- Track quality scores
---
Quick Reference
The 5-Stage Pipeline
| Stage | Skill | Time | Quality Gate | Output |
|-------|-------|------|--------------|--------|
| 1 | multi-ai-research | 30-60m | β₯95/100 | Research findings |
| 2 | multi-ai-planning | 1.5-3h | β₯90/100 | Executable plan |
| 3 | multi-ai-implementation | 3-10h | Tests pass, β₯80% cov | Working code |
| 4 | multi-ai-testing | 1-3h | β₯95% cov, quality β₯90 | Verified tests |
| 5 | multi-ai-verification | 1-3h | β₯90/100, all layers | Production ready |
Total: 8-20 hours β Production-ready feature
Workflow Modes
| Mode | Stages | Time | Quality | Use For |
|------|--------|------|---------|---------|
| Standard | All 5 | 8-20h | Maximum | Production features |
| Fast | 2-5 (skip research) | 6-15h | High | Familiar domains |
| Quick | 2,3,4,5 (basic) | 3-8h | Good | Internal tools |
Quality Gates
- Research: β₯95/100, patterns identified
- Planning: β₯90/100, all tasks verifiable
- Implementation: Tests pass, coverage β₯80%
- Testing: Quality β₯90/100, coverage β₯95%
- Verification: β₯90/100, all 5 layers pass
---
end-to-end-orchestrator provides complete automation from feature description to production-ready code, coordinating all 5 multi-ai skills with quality gates, failure recovery, and state management - delivering enterprise-grade development workflows in a single command.
For examples, see examples/. For failure recovery, see Failure Recovery section.
More from this repository10
xai-stock-sentiment skill from adaptationio/skrillz
Generates Ralph-compatible prompts for single implementation tasks with clear completion criteria and automatic verification.
Orchestrates continuous autonomous coding sessions, managing feature implementation, testing, and progress tracking with intelligent checkpointing and recovery mechanisms.
auto-claude-troubleshooting skill from adaptationio/skrillz
auto-claude-setup skill from adaptationio/skrillz
Analyzes Claude Code observability data to generate insights on performance, costs, errors, tool usage, sessions, conversations, and subagents through advanced metrics and log querying.
xai-financial-integration skill from adaptationio/skrillz
xai-agent-tools skill from adaptationio/skrillz
xai-crypto-sentiment skill from adaptationio/skrillz
auto-claude-memory skill from adaptationio/skrillz