🎯

checkpoint-workflow-builder

🎯Skill

from jackspace/claudeskillz

VibeIndex|
What it does

checkpoint-workflow-builder skill from jackspace/claudeskillz

πŸ“¦

Part of

jackspace/claudeskillz(48 items)

checkpoint-workflow-builder

Installation

git cloneClone repository
git clone https://github.com/jackspace/ClaudeSkillz.git temp-repo
git cloneClone repository
git clone https://github.com/jackspace/ClaudeSkillz.git
PythonRun Python server
python tools/generate-catalog.py
PythonRun Python server
python -m http.server 8000
πŸ“– Extracted from docs: jackspace/claudeskillz
5Installs
8
-
Last UpdatedNov 20, 2025

Skill Details

SKILL.md

Build resumable state-machine workflows with checkpoint patterns, progress preservation, and automatic recovery for complex multi-phase operations that need to survive interruptions, timeouts, and failures.

Overview

# Checkpoint Workflow Builder

Design and implement fault-tolerant workflows that can resume from any point of failure.

Overview

Complex workflows often fail mid-execution due to:

  • Network timeouts
  • System crashes
  • Resource exhaustion
  • External service failures
  • User interruptions

This skill teaches you to build workflows that:

  • Save progress at key checkpoints
  • Resume from last successful state
  • Handle partial failures gracefully
  • Provide clear progress visibility
  • Enable manual intervention points

When to Use

Use this skill when:

  • Building multi-phase data pipelines
  • Implementing long-running migration scripts
  • Creating deployment workflows
  • Processing large batches of items
  • Orchestrating multi-system operations
  • Building ETL (Extract, Transform, Load) workflows
  • Implementing saga patterns for distributed systems
  • Creating user-facing wizards with save/resume

Core Concepts

State Machine Pattern

```

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚ INIT β”‚

β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜

β”‚

β–Ό

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚ DOWNLOAD β”‚

β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜

β”‚

β–Ό

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚ PROCESS β”‚

β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜

β”‚

β–Ό

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚ VALIDATE β”‚

β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜

β”‚

β–Ό

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚ FINALIZE β”‚

β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜

β”‚

β–Ό

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚ COMPLETE β”‚

β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

```

Each state:

  • Has clear entry conditions
  • Performs specific operations
  • Saves checkpoint before transition
  • Can be resumed independently

Basic Implementation

Simple State Machine

```bash

#!/bin/bash

# state-machine.sh - Basic resumable workflow

STATE_FILE=".workflow_state"

# Read current state (default: INIT)

CURRENT_STATE=$(cat "$STATE_FILE" 2>/dev/null || echo "INIT")

echo "Current state: $CURRENT_STATE"

case "$CURRENT_STATE" in

INIT)

echo "=== Phase 1: Initialization ==="

# Initialize workspace

mkdir -p workspace

mkdir -p results

# Download dependencies

echo "Setting up environment..."

# Save next state

echo "DOWNLOAD" > "$STATE_FILE"

echo "βœ“ Initialization complete"

echo "Run again to continue"

;;

DOWNLOAD)

echo "=== Phase 2: Download Data ==="

# Download data files

echo "Downloading data..."

# curl -o workspace/data.zip https://example.com/data.zip

# Verify download

if [ -f "workspace/data.zip" ]; then

echo "EXTRACT" > "$STATE_FILE"

echo "βœ“ Download complete"

echo "Run again to continue"

else

echo "βœ— Download failed - fix and run again"

exit 1

fi

;;

EXTRACT)

echo "=== Phase 3: Extract Data ==="

# Extract files

echo "Extracting data..."

# unzip workspace/data.zip -d workspace/

echo "PROCESS" > "$STATE_FILE"

echo "βœ“ Extraction complete"

echo "Run again to continue"

;;

PROCESS)

echo "=== Phase 4: Process Data ==="

# Process data

echo "Processing data..."

# ./process_data.sh workspace/ results/

echo "VALIDATE" > "$STATE_FILE"

echo "βœ“ Processing complete"

echo "Run again to continue"

;;

VALIDATE)

echo "=== Phase 5: Validate Results ==="

# Validate results

echo "Validating results..."

# ./validate.sh results/

if [ $? -eq 0 ]; then

echo "FINALIZE" > "$STATE_FILE"

echo "βœ“ Validation passed"

echo "Run again to finalize"

else

echo "βœ— Validation failed"

echo "Fix issues and change state to PROCESS to reprocess"

exit 1

fi

;;

FINALIZE)

echo "=== Phase 6: Finalize ==="

# Cleanup and finalize

echo "Finalizing workflow..."

# mv results/ final/

# rm -rf workspace/

echo "COMPLETE" > "$STATE_FILE"

echo "βœ“ Workflow complete!"

;;

COMPLETE)

echo "=== Workflow Already Complete ==="

echo "Results available in: final/"

;;

*)

echo "βœ— Unknown state: $CURRENT_STATE"

echo "Reset with: echo 'INIT' > $STATE_FILE"

exit 1

;;

esac

```

Enhanced with Progress Tracking

```bash

#!/bin/bash

# enhanced-state-machine.sh - With detailed progress

STATE_FILE=".workflow_state"

PROGRESS_FILE=".workflow_progress.json"

# Initialize progress tracking

init_progress() {

cat > "$PROGRESS_FILE" << EOF

{

"current_state": "INIT",

"started_at": "$(date -Iseconds)",

"updated_at": "$(date -Iseconds)",

"phases": {

"INIT": {"status": "pending", "started": null, "completed": null},

"DOWNLOAD": {"status": "pending", "started": null, "completed": null},

"PROCESS": {"status": "pending", "started": null, "completed": null},

"VALIDATE": {"status": "pending", "started": null, "completed": null},

"FINALIZE": {"status": "pending", "started": null, "completed": null}

}

}

EOF

}

# Update progress

update_progress() {

local state="$1"

local status="$2" # "running", "completed", "failed"

local timestamp="$(date -Iseconds)"

if [ ! -f "$PROGRESS_FILE" ]; then

init_progress

fi

jq --arg state "$state" \

--arg status "$status" \

--arg timestamp "$timestamp" \

'.current_state = $state |

.updated_at = $timestamp |

.phases[$state].status = $status |

(.phases[$state].started //= $timestamp) |

(if $status == "completed" then .phases[$state].completed = $timestamp else . end)' \

"$PROGRESS_FILE" > "$PROGRESS_FILE.tmp"

mv "$PROGRESS_FILE.tmp" "$PROGRESS_FILE"

}

# Show progress

show_progress() {

if [ ! -f "$PROGRESS_FILE" ]; then

echo "No progress file found"

return

fi

echo "=== Workflow Progress ==="

echo ""

jq -r '.phases | to_entries | .[] |

"[\(.value.status | ascii_upcase)] \(.key)" +

(if .value.started then " (started: " + .value.started + ")" else "" end)' \

"$PROGRESS_FILE"

echo ""

echo "Current: $(jq -r '.current_state' $PROGRESS_FILE)"

echo "Updated: $(jq -r '.updated_at' $PROGRESS_FILE)"

}

# Workflow implementation

run_workflow() {

CURRENT_STATE=$(cat "$STATE_FILE" 2>/dev/null || echo "INIT")

# Show progress before executing

show_progress

case "$CURRENT_STATE" in

INIT)

update_progress "INIT" "running"

echo "Initializing..."

# ... initialization logic ...

update_progress "INIT" "completed"

echo "DOWNLOAD" > "$STATE_FILE"

;;

DOWNLOAD)

update_progress "DOWNLOAD" "running"

echo "Downloading..."

# ... download logic ...

update_progress "DOWNLOAD" "completed"

echo "PROCESS" > "$STATE_FILE"

;;

# ... other states ...

esac

}

run_workflow

```

Advanced Patterns

Pattern 1: Batched Checkpoint

```bash

#!/bin/bash

# batched-checkpoint.sh - Process items with batch checkpoints

ITEMS_FILE="items.txt"

CHECKPOINT_FILE=".batch_checkpoint"

BATCH_SIZE=10

# Load checkpoint

if [ -f "$CHECKPOINT_FILE" ]; then

LAST_COMPLETED=$(cat "$CHECKPOINT_FILE")

echo "Resuming from item $LAST_COMPLETED"

else

LAST_COMPLETED=0

fi

TOTAL_ITEMS=$(wc -l < "$ITEMS_FILE")

ITEMS_PROCESSED=0

# Process in batches

tail -n +$((LAST_COMPLETED + 1)) "$ITEMS_FILE" | while read -r item; do

# Process item

echo "Processing: $item"

process_item "$item"

ITEMS_PROCESSED=$((ITEMS_PROCESSED + 1))

# Checkpoint every batch

if [ $((ITEMS_PROCESSED % BATCH_SIZE)) -eq 0 ]; then

CURRENT_POSITION=$((LAST_COMPLETED + ITEMS_PROCESSED))

echo "$CURRENT_POSITION" > "$CHECKPOINT_FILE"

echo "Checkpoint: $CURRENT_POSITION/$TOTAL_ITEMS"

# Optional: Break for timeout management

if [ $((ITEMS_PROCESSED)) -ge $((BATCH_SIZE * 5)) ]; then

echo "Processed 5 batches ($(($BATCH_SIZE * 5)) items)"

echo "Run again to continue"

exit 0

fi

fi

done

# Final checkpoint

echo "$TOTAL_ITEMS" > "$CHECKPOINT_FILE"

echo "βœ“ All items processed!"

```

Pattern 2: Rollback Support

```bash

#!/bin/bash

# rollback-workflow.sh - State machine with rollback

STATE_FILE=".state"

ROLLBACK_DIR=".rollback"

mkdir -p "$ROLLBACK_DIR"

# Save rollback point

save_rollback() {

local state="$1"

local timestamp=$(date +%s)

tar -czf "$ROLLBACK_DIR/${state}_${timestamp}.tar.gz" workspace/ 2>/dev/null || true

echo "${state}_${timestamp}" > "$ROLLBACK_DIR/latest"

}

# Perform rollback

rollback() {

local rollback_point="$1"

if [ -z "$rollback_point" ]; then

rollback_point=$(cat "$ROLLBACK_DIR/latest" 2>/dev/null)

fi

if [ -z "$rollback_point" ] || [ ! -f "$ROLLBACK_DIR/${rollback_point}.tar.gz" ]; then

echo "βœ— No rollback point found"

exit 1

fi

echo "Rolling back to: $rollback_point"

# Restore from backup

rm -rf workspace/

tar -xzf "$ROLLBACK_DIR/${rollback_point}.tar.gz"

# Set state

STATE=$(echo "$rollback_point" | cut -d'_' -f1)

echo "$STATE" > "$STATE_FILE"

echo "βœ“ Rolled back to state: $STATE"

}

# In workflow

case "$CURRENT_STATE" in

DOWNLOAD)

save_rollback "PRE_DOWNLOAD"

# ... download logic ...

;;

PROCESS)

save_rollback "PRE_PROCESS"

# ... process logic ...

;;

esac

```

Pattern 3: Parallel Phase Execution

```bash

#!/bin/bash

# parallel-phases.sh - Execute independent phases in parallel

STATE_FILE=".parallel_state.json"

# Initialize parallel state

init_parallel_state() {

cat > "$STATE_FILE" << EOF

{

"phase_a": "pending",

"phase_b": "pending",

"phase_c": "pending",

"finalize": "pending"

}

EOF

}

# Update phase status

update_phase() {

local phase="$1"

local status="$2"

jq --arg phase "$phase" --arg status "$status" \

'.[$phase] = $status' "$STATE_FILE" > "$STATE_FILE.tmp"

mv "$STATE_FILE.tmp" "$STATE_FILE"

}

# Check if all phases complete

all_phases_complete() {

local incomplete=$(jq -r '[.phase_a, .phase_b, .phase_c] | map(select(. != "completed")) | length' "$STATE_FILE")

[ "$incomplete" -eq 0 ]

}

# Execute phase

execute_phase() {

local phase="$1"

local current_status=$(jq -r ".$phase" "$STATE_FILE")

if [ "$current_status" = "completed" ]; then

echo "$phase: Already complete"

return 0

fi

echo "$phase: Running"

update_phase "$phase" "running"

# Phase-specific logic

case "$phase" in

phase_a)

# Independent task A

task_a

;;

phase_b)

# Independent task B

task_b

;;

phase_c)

# Independent task C

task_c

;;

esac

if [ $? -eq 0 ]; then

update_phase "$phase" "completed"

echo "$phase: Completed"

else

update_phase "$phase" "failed"

echo "$phase: Failed"

return 1

fi

}

# Main workflow

[ ! -f "$STATE_FILE" ] && init_parallel_state

# Run independent phases

execute_phase "phase_a"

execute_phase "phase_b"

execute_phase "phase_c"

# Check if ready for finalization

if all_phases_complete; then

finalize_status=$(jq -r '.finalize' "$STATE_FILE")

if [ "$finalize_status" != "completed" ]; then

echo "All phases complete. Finalizing..."

finalize

update_phase "finalize" "completed"

echo "βœ“ Workflow complete!"

fi

else

echo "Some phases incomplete. Run again to retry."

fi

```

Real-World Examples

Example 1: Database Migration Workflow

```bash

#!/bin/bash

# db-migration-workflow.sh

STATE_FILE=".migration_state"

MIGRATION_LOG="migration.log"

log() {

echo "[$(date -Iseconds)] $1" | tee -a "$MIGRATION_LOG"

}

CURRENT_STATE=$(cat "$STATE_FILE" 2>/dev/null || echo "INIT")

case "$CURRENT_STATE" in

INIT)

log "=== Starting Migration ==="

log "Checking prerequisites..."

# Verify database connection

psql -U user -d database -c "SELECT 1" > /dev/null 2>&1 || {

log "βœ— Database connection failed"

exit 1

}

# Verify backup destination

[ -d "/backups" ] || mkdir -p /backups

echo "BACKUP" > "$STATE_FILE"

log "βœ“ Prerequisites checked"

;;

BACKUP)

log "=== Creating Backup ==="

BACKUP_FILE="/backups/db_$(date +%Y%m%d_%H%M%S).sql"

pg_dump -U user database > "$BACKUP_FILE"

if [ -f "$BACKUP_FILE" ] && [ -s "$BACKUP_FILE" ]; then

log "βœ“ Backup created: $BACKUP_FILE"

echo "MIGRATE" > "$STATE_FILE"

else

log "βœ— Backup failed"

exit 1

fi

;;

MIGRATE)

log "=== Running Migration ==="

# Apply migration scripts in order

for script in migrations/*.sql; do

log "Applying: $(basename $script)"

psql -U user -d database -f "$script" >> "$MIGRATION_LOG" 2>&1

if [ $? -ne 0 ]; then

log "βœ— Migration failed: $script"

log "To rollback: ./db-migration-workflow.sh rollback"

exit 1

fi

done

echo "VALIDATE" > "$STATE_FILE"

log "βœ“ Migrations applied"

;;

VALIDATE)

log "=== Validating Migration ==="

# Run validation queries

VALIDATION_RESULT=$(psql -U user -d database -f validation.sql 2>&1)

if echo "$VALIDATION_RESULT" | grep -q "PASS"; then

log "βœ“ Validation passed"

echo "COMPLETE" > "$STATE_FILE"

else

log "βœ— Validation failed"

log "$VALIDATION_RESULT"

log "Review and fix, then change state to MIGRATE to retry"

exit 1

fi

;;

COMPLETE)

log "=== Migration Complete ==="

log "Database migrated successfully"

;;

rollback)

log "=== Rolling Back Migration ==="

# Find latest backup

LATEST_BACKUP=$(ls -t /backups/db_*.sql | head -1)

if [ -z "$LATEST_BACKUP" ]; then

log "βœ— No backup found"

exit 1

fi

log "Restoring from: $LATEST_BACKUP"

psql -U user -d database < "$LATEST_BACKUP"

log "βœ“ Rollback complete"

echo "INIT" > "$STATE_FILE"

;;

esac

```

Example 2: Data Pipeline Workflow

```bash

#!/bin/bash

# data-pipeline.sh - ETL pipeline with checkpoints

STATE_FILE=".pipeline_state.json"

init_state() {

cat > "$STATE_FILE" << EOF

{

"state": "EXTRACT",

"extracted_files": [],

"transformed_files": [],

"loaded_records": 0,

"total_records": 0

}

EOF

}

[ ! -f "$STATE_FILE" ] && init_state

CURRENT_STATE=$(jq -r '.state' "$STATE_FILE")

case "$CURRENT_STATE" in

EXTRACT)

echo "=== Extract Phase ==="

# Extract from sources

for source in source1 source2 source3; do

# Check if already extracted

if jq -e ".extracted_files | index(\"$source\")" "$STATE_FILE" > /dev/null; then

echo "βœ“ $source already extracted"

continue

fi

echo "Extracting from: $source"

extract_from_source "$source" > "raw_data/$source.csv"

# Update state

jq ".extracted_files += [\"$source\"]" "$STATE_FILE" > "$STATE_FILE.tmp"

mv "$STATE_FILE.tmp" "$STATE_FILE"

done

# Move to next state

jq '.state = "TRANSFORM"' "$STATE_FILE" > "$STATE_FILE.tmp"

mv "$STATE_FILE.tmp" "$STATE_FILE"

echo "βœ“ Extraction complete"

;;

TRANSFORM)

echo "=== Transform Phase ==="

for file in raw_data/*.csv; do

filename=$(basename "$file")

# Check if already transformed

if jq -e ".transformed_files | index(\"$filename\")" "$STATE_FILE" > /dev/null; then

echo "βœ“ $filename already transformed"

continue

fi

echo "Transforming: $filename"

transform_data "$file" > "processed_data/$filename"

# Update state

jq ".transformed_files += [\"$filename\"]" "$STATE_FILE" > "$STATE_FILE.tmp"

mv "$STATE_FILE.tmp" "$STATE_FILE"

done

# Count total records for loading

TOTAL=$(wc -l processed_data/*.csv | tail -1 | awk '{print $1}')

jq ".total_records = $TOTAL | .state = \"LOAD\"" "$STATE_FILE" > "$STATE_FILE.tmp"

mv "$STATE_FILE.tmp" "$STATE_FILE"

echo "βœ“ Transformation complete"

;;

LOAD)

echo "=== Load Phase ==="

LOADED=$(jq -r '.loaded_records' "$STATE_FILE")

TOTAL=$(jq -r '.total_records' "$STATE_FILE")

echo "Progress: $LOADED/$TOTAL records"

# Load data in batches

BATCH_SIZE=1000

tail -n +$((LOADED + 1)) processed_data/combined.csv | head -$BATCH_SIZE | while read record; do

load_record "$record"

LOADED=$((LOADED + 1))

# Checkpoint every 100 records

if [ $((LOADED % 100)) -eq 0 ]; then

jq ".loaded_records = $LOADED" "$STATE_FILE" > "$STATE_FILE.tmp"

mv "$STATE_FILE.tmp" "$STATE_FILE"

fi

done

# Check if complete

LOADED=$(jq -r '.loaded_records' "$STATE_FILE")

if [ "$LOADED" -ge "$TOTAL" ]; then

jq '.state = "COMPLETE"' "$STATE_FILE" > "$STATE_FILE.tmp"

mv "$STATE_FILE.tmp" "$STATE_FILE"

echo "βœ“ Loading complete"

else

echo "Loaded $LOADED/$TOTAL - run again to continue"

fi

;;

COMPLETE)

echo "=== Pipeline Complete ==="

jq '.' "$STATE_FILE"

;;

esac

```

Best Practices

βœ… DO

  1. Design clear states - Each state has single responsibility
  2. Save frequently - Checkpoint after each significant operation
  3. Validate transitions - Verify state before transitioning
  4. Handle failures - Plan for failure at every state
  5. Log everything - Comprehensive logging for debugging
  6. Enable inspection - Make state file human-readable (JSON)
  7. Support rollback - Save enough info to undo
  8. Test resumption - Verify workflow resumes correctly

❌ DON'T

  1. Don't mix concerns - Keep states focused
  2. Don't skip validation - Validate before each transition
  3. Don't hide state - Make current state obvious
  4. Don't lose progress - Always save before risky operations
  5. Don't ignore errors - Handle errors explicitly
  6. Don't hardcode paths - Use configuration
  7. Don't forget cleanup - Remove checkpoint files when done
  8. Don't nest too deep - Keep state machine flat

Quick Reference

```bash

# Basic state machine

STATE=$(cat .state || echo "INIT")

case "$STATE" in

STATE1) action1; echo "STATE2" > .state ;;

STATE2) action2; echo "STATE3" > .state ;;

esac

# With progress tracking (JSON)

jq '.state = "NEW_STATE" | .updated = "'$(date -Iseconds)'"' .state.json

# Checkpoint pattern

process_batch && echo "$POSITION" > .checkpoint

# Rollback support

tar -czf .rollback/pre_${STATE}.tar.gz data/

# Reset workflow

echo "INIT" > .state && rm .progress.json

```

---

Version: 1.0.0

Author: Harvested from timeout-prevention and state machine patterns

Last Updated: 2025-11-18

License: MIT

Key Principle: Every operation should be resumable from its last successful checkpoint.

Build workflows that never lose progress! πŸ”„