checkpoint-workflow-builder
π―Skillfrom jackspace/claudeskillz
checkpoint-workflow-builder skill from jackspace/claudeskillz
Part of
jackspace/claudeskillz(48 items)
Installation
git clone https://github.com/jackspace/ClaudeSkillz.git temp-repogit clone https://github.com/jackspace/ClaudeSkillz.gitpython tools/generate-catalog.pypython -m http.server 8000Skill Details
Build resumable state-machine workflows with checkpoint patterns, progress preservation, and automatic recovery for complex multi-phase operations that need to survive interruptions, timeouts, and failures.
Overview
# Checkpoint Workflow Builder
Design and implement fault-tolerant workflows that can resume from any point of failure.
Overview
Complex workflows often fail mid-execution due to:
- Network timeouts
- System crashes
- Resource exhaustion
- External service failures
- User interruptions
This skill teaches you to build workflows that:
- Save progress at key checkpoints
- Resume from last successful state
- Handle partial failures gracefully
- Provide clear progress visibility
- Enable manual intervention points
When to Use
Use this skill when:
- Building multi-phase data pipelines
- Implementing long-running migration scripts
- Creating deployment workflows
- Processing large batches of items
- Orchestrating multi-system operations
- Building ETL (Extract, Transform, Load) workflows
- Implementing saga patterns for distributed systems
- Creating user-facing wizards with save/resume
Core Concepts
State Machine Pattern
```
βββββββββββ
β INIT β
ββββββ¬βββββ
β
βΌ
ββββββββββββββ
β DOWNLOAD β
βββββββ¬βββββββ
β
βΌ
ββββββββββββββ
β PROCESS β
βββββββ¬βββββββ
β
βΌ
ββββββββββββββ
β VALIDATE β
βββββββ¬βββββββ
β
βΌ
ββββββββββββββ
β FINALIZE β
βββββββ¬βββββββ
β
βΌ
ββββββββββββββ
β COMPLETE β
ββββββββββββββ
```
Each state:
- Has clear entry conditions
- Performs specific operations
- Saves checkpoint before transition
- Can be resumed independently
Basic Implementation
Simple State Machine
```bash
#!/bin/bash
# state-machine.sh - Basic resumable workflow
STATE_FILE=".workflow_state"
# Read current state (default: INIT)
CURRENT_STATE=$(cat "$STATE_FILE" 2>/dev/null || echo "INIT")
echo "Current state: $CURRENT_STATE"
case "$CURRENT_STATE" in
INIT)
echo "=== Phase 1: Initialization ==="
# Initialize workspace
mkdir -p workspace
mkdir -p results
# Download dependencies
echo "Setting up environment..."
# Save next state
echo "DOWNLOAD" > "$STATE_FILE"
echo "β Initialization complete"
echo "Run again to continue"
;;
DOWNLOAD)
echo "=== Phase 2: Download Data ==="
# Download data files
echo "Downloading data..."
# curl -o workspace/data.zip https://example.com/data.zip
# Verify download
if [ -f "workspace/data.zip" ]; then
echo "EXTRACT" > "$STATE_FILE"
echo "β Download complete"
echo "Run again to continue"
else
echo "β Download failed - fix and run again"
exit 1
fi
;;
EXTRACT)
echo "=== Phase 3: Extract Data ==="
# Extract files
echo "Extracting data..."
# unzip workspace/data.zip -d workspace/
echo "PROCESS" > "$STATE_FILE"
echo "β Extraction complete"
echo "Run again to continue"
;;
PROCESS)
echo "=== Phase 4: Process Data ==="
# Process data
echo "Processing data..."
# ./process_data.sh workspace/ results/
echo "VALIDATE" > "$STATE_FILE"
echo "β Processing complete"
echo "Run again to continue"
;;
VALIDATE)
echo "=== Phase 5: Validate Results ==="
# Validate results
echo "Validating results..."
# ./validate.sh results/
if [ $? -eq 0 ]; then
echo "FINALIZE" > "$STATE_FILE"
echo "β Validation passed"
echo "Run again to finalize"
else
echo "β Validation failed"
echo "Fix issues and change state to PROCESS to reprocess"
exit 1
fi
;;
FINALIZE)
echo "=== Phase 6: Finalize ==="
# Cleanup and finalize
echo "Finalizing workflow..."
# mv results/ final/
# rm -rf workspace/
echo "COMPLETE" > "$STATE_FILE"
echo "β Workflow complete!"
;;
COMPLETE)
echo "=== Workflow Already Complete ==="
echo "Results available in: final/"
;;
*)
echo "β Unknown state: $CURRENT_STATE"
echo "Reset with: echo 'INIT' > $STATE_FILE"
exit 1
;;
esac
```
Enhanced with Progress Tracking
```bash
#!/bin/bash
# enhanced-state-machine.sh - With detailed progress
STATE_FILE=".workflow_state"
PROGRESS_FILE=".workflow_progress.json"
# Initialize progress tracking
init_progress() {
cat > "$PROGRESS_FILE" << EOF
{
"current_state": "INIT",
"started_at": "$(date -Iseconds)",
"updated_at": "$(date -Iseconds)",
"phases": {
"INIT": {"status": "pending", "started": null, "completed": null},
"DOWNLOAD": {"status": "pending", "started": null, "completed": null},
"PROCESS": {"status": "pending", "started": null, "completed": null},
"VALIDATE": {"status": "pending", "started": null, "completed": null},
"FINALIZE": {"status": "pending", "started": null, "completed": null}
}
}
EOF
}
# Update progress
update_progress() {
local state="$1"
local status="$2" # "running", "completed", "failed"
local timestamp="$(date -Iseconds)"
if [ ! -f "$PROGRESS_FILE" ]; then
init_progress
fi
jq --arg state "$state" \
--arg status "$status" \
--arg timestamp "$timestamp" \
'.current_state = $state |
.updated_at = $timestamp |
.phases[$state].status = $status |
(.phases[$state].started //= $timestamp) |
(if $status == "completed" then .phases[$state].completed = $timestamp else . end)' \
"$PROGRESS_FILE" > "$PROGRESS_FILE.tmp"
mv "$PROGRESS_FILE.tmp" "$PROGRESS_FILE"
}
# Show progress
show_progress() {
if [ ! -f "$PROGRESS_FILE" ]; then
echo "No progress file found"
return
fi
echo "=== Workflow Progress ==="
echo ""
jq -r '.phases | to_entries | .[] |
"[\(.value.status | ascii_upcase)] \(.key)" +
(if .value.started then " (started: " + .value.started + ")" else "" end)' \
"$PROGRESS_FILE"
echo ""
echo "Current: $(jq -r '.current_state' $PROGRESS_FILE)"
echo "Updated: $(jq -r '.updated_at' $PROGRESS_FILE)"
}
# Workflow implementation
run_workflow() {
CURRENT_STATE=$(cat "$STATE_FILE" 2>/dev/null || echo "INIT")
# Show progress before executing
show_progress
case "$CURRENT_STATE" in
INIT)
update_progress "INIT" "running"
echo "Initializing..."
# ... initialization logic ...
update_progress "INIT" "completed"
echo "DOWNLOAD" > "$STATE_FILE"
;;
DOWNLOAD)
update_progress "DOWNLOAD" "running"
echo "Downloading..."
# ... download logic ...
update_progress "DOWNLOAD" "completed"
echo "PROCESS" > "$STATE_FILE"
;;
# ... other states ...
esac
}
run_workflow
```
Advanced Patterns
Pattern 1: Batched Checkpoint
```bash
#!/bin/bash
# batched-checkpoint.sh - Process items with batch checkpoints
ITEMS_FILE="items.txt"
CHECKPOINT_FILE=".batch_checkpoint"
BATCH_SIZE=10
# Load checkpoint
if [ -f "$CHECKPOINT_FILE" ]; then
LAST_COMPLETED=$(cat "$CHECKPOINT_FILE")
echo "Resuming from item $LAST_COMPLETED"
else
LAST_COMPLETED=0
fi
TOTAL_ITEMS=$(wc -l < "$ITEMS_FILE")
ITEMS_PROCESSED=0
# Process in batches
tail -n +$((LAST_COMPLETED + 1)) "$ITEMS_FILE" | while read -r item; do
# Process item
echo "Processing: $item"
process_item "$item"
ITEMS_PROCESSED=$((ITEMS_PROCESSED + 1))
# Checkpoint every batch
if [ $((ITEMS_PROCESSED % BATCH_SIZE)) -eq 0 ]; then
CURRENT_POSITION=$((LAST_COMPLETED + ITEMS_PROCESSED))
echo "$CURRENT_POSITION" > "$CHECKPOINT_FILE"
echo "Checkpoint: $CURRENT_POSITION/$TOTAL_ITEMS"
# Optional: Break for timeout management
if [ $((ITEMS_PROCESSED)) -ge $((BATCH_SIZE * 5)) ]; then
echo "Processed 5 batches ($(($BATCH_SIZE * 5)) items)"
echo "Run again to continue"
exit 0
fi
fi
done
# Final checkpoint
echo "$TOTAL_ITEMS" > "$CHECKPOINT_FILE"
echo "β All items processed!"
```
Pattern 2: Rollback Support
```bash
#!/bin/bash
# rollback-workflow.sh - State machine with rollback
STATE_FILE=".state"
ROLLBACK_DIR=".rollback"
mkdir -p "$ROLLBACK_DIR"
# Save rollback point
save_rollback() {
local state="$1"
local timestamp=$(date +%s)
tar -czf "$ROLLBACK_DIR/${state}_${timestamp}.tar.gz" workspace/ 2>/dev/null || true
echo "${state}_${timestamp}" > "$ROLLBACK_DIR/latest"
}
# Perform rollback
rollback() {
local rollback_point="$1"
if [ -z "$rollback_point" ]; then
rollback_point=$(cat "$ROLLBACK_DIR/latest" 2>/dev/null)
fi
if [ -z "$rollback_point" ] || [ ! -f "$ROLLBACK_DIR/${rollback_point}.tar.gz" ]; then
echo "β No rollback point found"
exit 1
fi
echo "Rolling back to: $rollback_point"
# Restore from backup
rm -rf workspace/
tar -xzf "$ROLLBACK_DIR/${rollback_point}.tar.gz"
# Set state
STATE=$(echo "$rollback_point" | cut -d'_' -f1)
echo "$STATE" > "$STATE_FILE"
echo "β Rolled back to state: $STATE"
}
# In workflow
case "$CURRENT_STATE" in
DOWNLOAD)
save_rollback "PRE_DOWNLOAD"
# ... download logic ...
;;
PROCESS)
save_rollback "PRE_PROCESS"
# ... process logic ...
;;
esac
```
Pattern 3: Parallel Phase Execution
```bash
#!/bin/bash
# parallel-phases.sh - Execute independent phases in parallel
STATE_FILE=".parallel_state.json"
# Initialize parallel state
init_parallel_state() {
cat > "$STATE_FILE" << EOF
{
"phase_a": "pending",
"phase_b": "pending",
"phase_c": "pending",
"finalize": "pending"
}
EOF
}
# Update phase status
update_phase() {
local phase="$1"
local status="$2"
jq --arg phase "$phase" --arg status "$status" \
'.[$phase] = $status' "$STATE_FILE" > "$STATE_FILE.tmp"
mv "$STATE_FILE.tmp" "$STATE_FILE"
}
# Check if all phases complete
all_phases_complete() {
local incomplete=$(jq -r '[.phase_a, .phase_b, .phase_c] | map(select(. != "completed")) | length' "$STATE_FILE")
[ "$incomplete" -eq 0 ]
}
# Execute phase
execute_phase() {
local phase="$1"
local current_status=$(jq -r ".$phase" "$STATE_FILE")
if [ "$current_status" = "completed" ]; then
echo "$phase: Already complete"
return 0
fi
echo "$phase: Running"
update_phase "$phase" "running"
# Phase-specific logic
case "$phase" in
phase_a)
# Independent task A
task_a
;;
phase_b)
# Independent task B
task_b
;;
phase_c)
# Independent task C
task_c
;;
esac
if [ $? -eq 0 ]; then
update_phase "$phase" "completed"
echo "$phase: Completed"
else
update_phase "$phase" "failed"
echo "$phase: Failed"
return 1
fi
}
# Main workflow
[ ! -f "$STATE_FILE" ] && init_parallel_state
# Run independent phases
execute_phase "phase_a"
execute_phase "phase_b"
execute_phase "phase_c"
# Check if ready for finalization
if all_phases_complete; then
finalize_status=$(jq -r '.finalize' "$STATE_FILE")
if [ "$finalize_status" != "completed" ]; then
echo "All phases complete. Finalizing..."
finalize
update_phase "finalize" "completed"
echo "β Workflow complete!"
fi
else
echo "Some phases incomplete. Run again to retry."
fi
```
Real-World Examples
Example 1: Database Migration Workflow
```bash
#!/bin/bash
# db-migration-workflow.sh
STATE_FILE=".migration_state"
MIGRATION_LOG="migration.log"
log() {
echo "[$(date -Iseconds)] $1" | tee -a "$MIGRATION_LOG"
}
CURRENT_STATE=$(cat "$STATE_FILE" 2>/dev/null || echo "INIT")
case "$CURRENT_STATE" in
INIT)
log "=== Starting Migration ==="
log "Checking prerequisites..."
# Verify database connection
psql -U user -d database -c "SELECT 1" > /dev/null 2>&1 || {
log "β Database connection failed"
exit 1
}
# Verify backup destination
[ -d "/backups" ] || mkdir -p /backups
echo "BACKUP" > "$STATE_FILE"
log "β Prerequisites checked"
;;
BACKUP)
log "=== Creating Backup ==="
BACKUP_FILE="/backups/db_$(date +%Y%m%d_%H%M%S).sql"
pg_dump -U user database > "$BACKUP_FILE"
if [ -f "$BACKUP_FILE" ] && [ -s "$BACKUP_FILE" ]; then
log "β Backup created: $BACKUP_FILE"
echo "MIGRATE" > "$STATE_FILE"
else
log "β Backup failed"
exit 1
fi
;;
MIGRATE)
log "=== Running Migration ==="
# Apply migration scripts in order
for script in migrations/*.sql; do
log "Applying: $(basename $script)"
psql -U user -d database -f "$script" >> "$MIGRATION_LOG" 2>&1
if [ $? -ne 0 ]; then
log "β Migration failed: $script"
log "To rollback: ./db-migration-workflow.sh rollback"
exit 1
fi
done
echo "VALIDATE" > "$STATE_FILE"
log "β Migrations applied"
;;
VALIDATE)
log "=== Validating Migration ==="
# Run validation queries
VALIDATION_RESULT=$(psql -U user -d database -f validation.sql 2>&1)
if echo "$VALIDATION_RESULT" | grep -q "PASS"; then
log "β Validation passed"
echo "COMPLETE" > "$STATE_FILE"
else
log "β Validation failed"
log "$VALIDATION_RESULT"
log "Review and fix, then change state to MIGRATE to retry"
exit 1
fi
;;
COMPLETE)
log "=== Migration Complete ==="
log "Database migrated successfully"
;;
rollback)
log "=== Rolling Back Migration ==="
# Find latest backup
LATEST_BACKUP=$(ls -t /backups/db_*.sql | head -1)
if [ -z "$LATEST_BACKUP" ]; then
log "β No backup found"
exit 1
fi
log "Restoring from: $LATEST_BACKUP"
psql -U user -d database < "$LATEST_BACKUP"
log "β Rollback complete"
echo "INIT" > "$STATE_FILE"
;;
esac
```
Example 2: Data Pipeline Workflow
```bash
#!/bin/bash
# data-pipeline.sh - ETL pipeline with checkpoints
STATE_FILE=".pipeline_state.json"
init_state() {
cat > "$STATE_FILE" << EOF
{
"state": "EXTRACT",
"extracted_files": [],
"transformed_files": [],
"loaded_records": 0,
"total_records": 0
}
EOF
}
[ ! -f "$STATE_FILE" ] && init_state
CURRENT_STATE=$(jq -r '.state' "$STATE_FILE")
case "$CURRENT_STATE" in
EXTRACT)
echo "=== Extract Phase ==="
# Extract from sources
for source in source1 source2 source3; do
# Check if already extracted
if jq -e ".extracted_files | index(\"$source\")" "$STATE_FILE" > /dev/null; then
echo "β $source already extracted"
continue
fi
echo "Extracting from: $source"
extract_from_source "$source" > "raw_data/$source.csv"
# Update state
jq ".extracted_files += [\"$source\"]" "$STATE_FILE" > "$STATE_FILE.tmp"
mv "$STATE_FILE.tmp" "$STATE_FILE"
done
# Move to next state
jq '.state = "TRANSFORM"' "$STATE_FILE" > "$STATE_FILE.tmp"
mv "$STATE_FILE.tmp" "$STATE_FILE"
echo "β Extraction complete"
;;
TRANSFORM)
echo "=== Transform Phase ==="
for file in raw_data/*.csv; do
filename=$(basename "$file")
# Check if already transformed
if jq -e ".transformed_files | index(\"$filename\")" "$STATE_FILE" > /dev/null; then
echo "β $filename already transformed"
continue
fi
echo "Transforming: $filename"
transform_data "$file" > "processed_data/$filename"
# Update state
jq ".transformed_files += [\"$filename\"]" "$STATE_FILE" > "$STATE_FILE.tmp"
mv "$STATE_FILE.tmp" "$STATE_FILE"
done
# Count total records for loading
TOTAL=$(wc -l processed_data/*.csv | tail -1 | awk '{print $1}')
jq ".total_records = $TOTAL | .state = \"LOAD\"" "$STATE_FILE" > "$STATE_FILE.tmp"
mv "$STATE_FILE.tmp" "$STATE_FILE"
echo "β Transformation complete"
;;
LOAD)
echo "=== Load Phase ==="
LOADED=$(jq -r '.loaded_records' "$STATE_FILE")
TOTAL=$(jq -r '.total_records' "$STATE_FILE")
echo "Progress: $LOADED/$TOTAL records"
# Load data in batches
BATCH_SIZE=1000
tail -n +$((LOADED + 1)) processed_data/combined.csv | head -$BATCH_SIZE | while read record; do
load_record "$record"
LOADED=$((LOADED + 1))
# Checkpoint every 100 records
if [ $((LOADED % 100)) -eq 0 ]; then
jq ".loaded_records = $LOADED" "$STATE_FILE" > "$STATE_FILE.tmp"
mv "$STATE_FILE.tmp" "$STATE_FILE"
fi
done
# Check if complete
LOADED=$(jq -r '.loaded_records' "$STATE_FILE")
if [ "$LOADED" -ge "$TOTAL" ]; then
jq '.state = "COMPLETE"' "$STATE_FILE" > "$STATE_FILE.tmp"
mv "$STATE_FILE.tmp" "$STATE_FILE"
echo "β Loading complete"
else
echo "Loaded $LOADED/$TOTAL - run again to continue"
fi
;;
COMPLETE)
echo "=== Pipeline Complete ==="
jq '.' "$STATE_FILE"
;;
esac
```
Best Practices
β DO
- Design clear states - Each state has single responsibility
- Save frequently - Checkpoint after each significant operation
- Validate transitions - Verify state before transitioning
- Handle failures - Plan for failure at every state
- Log everything - Comprehensive logging for debugging
- Enable inspection - Make state file human-readable (JSON)
- Support rollback - Save enough info to undo
- Test resumption - Verify workflow resumes correctly
β DON'T
- Don't mix concerns - Keep states focused
- Don't skip validation - Validate before each transition
- Don't hide state - Make current state obvious
- Don't lose progress - Always save before risky operations
- Don't ignore errors - Handle errors explicitly
- Don't hardcode paths - Use configuration
- Don't forget cleanup - Remove checkpoint files when done
- Don't nest too deep - Keep state machine flat
Quick Reference
```bash
# Basic state machine
STATE=$(cat .state || echo "INIT")
case "$STATE" in
STATE1) action1; echo "STATE2" > .state ;;
STATE2) action2; echo "STATE3" > .state ;;
esac
# With progress tracking (JSON)
jq '.state = "NEW_STATE" | .updated = "'$(date -Iseconds)'"' .state.json
# Checkpoint pattern
process_batch && echo "$POSITION" > .checkpoint
# Rollback support
tar -czf .rollback/pre_${STATE}.tar.gz data/
# Reset workflow
echo "INIT" > .state && rm .progress.json
```
---
Version: 1.0.0
Author: Harvested from timeout-prevention and state machine patterns
Last Updated: 2025-11-18
License: MIT
Key Principle: Every operation should be resumable from its last successful checkpoint.
Build workflows that never lose progress! π
More from this repository10
web-asset-generator skill from jackspace/claudeskillz
ai-elements-chatbot skill from jackspace/claudeskillz
base-ui-react skill from jackspace/claudeskillz
Helps developers parse, validate, and manipulate JSON configuration files with intelligent error detection and automatic schema generation.
Diagnoses network connectivity issues by performing comprehensive tests including ping, traceroute, DNS resolution, and port scanning across multiple network interfaces.
cloudflare-nextjs skill from jackspace/claudeskillz
hugo skill from jackspace/claudeskillz
claude-code-bash-patterns skill from jackspace/claudeskillz
cloudflare-sandbox skill from jackspace/claudeskillz
docker-helper skill from jackspace/claudeskillz