🎯

data-orchestrator

🎯Skill

from dreamineering/meme-times

VibeIndex|
What it does

Orchestrates comprehensive data pipelines for AI-driven crypto trading, integrating multi-source data validation, governance, and real-time aggregation.

data-orchestrator

Installation

Install skill:
npx skills add https://github.com/dreamineering/meme-times --skill data-orchestrator
2
AddedJan 27, 2026

Skill Details

SKILL.md

|

Overview

# Data Orchestrator - AI Trading Data Strategy Layer

Central nervous system for all data operations across the meme-times ecosystem. Implements a robust, AI-ready data strategy that precedes and enables all trading decisions.

Core Principle

> Data strategy comes BEFORE AI. Clean governance, validated sources, and secure infrastructure enable useful trading insights.

Activation Triggers

  • "Build a data pipeline for [use case]"
  • "Validate data quality for [source]"
  • "Set up backtesting infrastructure"
  • "Aggregate data from multiple sources"
  • "Data governance for [trading strategy]"
  • "Real-time data feed for [token/market]"
  • Keywords: data pipeline, data quality, validation, aggregation, backtesting, ML training, historical data, real-time feed, data governance

Data Strategy Pillars

1. Diverse Data Sources

Price & Market Data:

| Source | Data Type | Latency | Quality | Cost |

|--------|-----------|---------|---------|------|

| Dexscreener API | DEX prices, pools | Real-time | High | Free tier |

| Birdeye API | Solana tokens, analytics | Real-time | High | Paid |

| Jupiter Price API | Solana swap prices | Real-time | High | Free |

| CoinGecko | Cross-chain prices | 1-5 min | High | Free tier |

| CoinAPI | Historical + streaming | Configurable | Premium | Paid |

On-Chain Data:

| Source | Data Type | Latency | Quality | Cost |

|--------|-----------|---------|---------|------|

| Helius RPC | Solana transactions | Real-time | High | Freemium |

| Solscan API | Token info, holders | Near real-time | High | Free tier |

| Dune Analytics | SQL queries | Minutes-hours | High | Freemium |

| Flipside Crypto | Pre-built datasets | Hours | High | Free |

Sentiment & Social:

| Source | Data Type | Latency | Quality | Cost |

|--------|-----------|---------|---------|------|

| Twitter/X API | Social mentions | Real-time | Medium | Paid |

| LunarCrush | Social metrics | Near real-time | High | Paid |

| Telegram scraping | Community sentiment | Real-time | Low | DIY |

| Reddit API | Discussion sentiment | Minutes | Medium | Free |

DeFi Protocol Data:

| Source | Data Type | Latency | Quality | Cost |

|--------|-----------|---------|---------|------|

| DefiLlama API | TVL, revenue, yields | 15 min | High | Free |

| Token Terminal | Revenue, P/E ratios | Daily | High | Paid |

| DeFi Pulse | TVL rankings | Hourly | Medium | Free |

2. Data Quality & Governance

Quality Dimensions:

```typescript

interface DataQualityMetrics {

accuracy: number; // 0-100: correctness vs ground truth

completeness: number; // 0-100: missing values ratio

timeliness: number; // 0-100: freshness score

consistency: number; // 0-100: cross-source agreement

validity: number; // 0-100: schema conformance

}

interface QualityThresholds {

trading_signals: { min: 90, critical: 'timeliness' };

historical_analysis: { min: 85, critical: 'completeness' };

sentiment_analysis: { min: 70, critical: 'consistency' };

backtesting: { min: 95, critical: 'accuracy' };

}

```

Validation Pipeline:

```

Raw Data β†’ Schema Validation β†’ Anomaly Detection β†’ Cross-Source Check β†’ Quality Score β†’ Accept/Reject

```

Validation Rules:

  1. Schema Validation: All data must match expected types/formats
  2. Range Checks: Prices, volumes, percentages within valid bounds
  3. Anomaly Detection: Flag outliers > 3 standard deviations
  4. Cross-Source Verification: Compare with 2+ sources for critical data
  5. Freshness Enforcement: Reject stale data beyond threshold

Automated Quality Monitoring:

```bash

# Run continuous quality checks

npx tsx .claude/skills/data-orchestrator/scripts/quality-monitor.ts \

--sources "dexscreener,birdeye,jupiter" \

--interval 60 \

--alert-threshold 80

```

3. Real-Time Data Pipeline Architecture

```

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚ DATA INGESTION LAYER β”‚

β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€

β”‚ WebSocket Feeds β”‚ REST Polling β”‚ RPC Subscriptions β”‚

β”‚ (Dexscreener, WS) β”‚ (Coingecko) β”‚ (Helius, Solana) β”‚

β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”‚ β”‚ β”‚

β–Ό β–Ό β–Ό

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚ VALIDATION & ENRICHMENT β”‚

β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€

β”‚ Schema Check β”‚ Anomaly Flag β”‚ Cross-Verify β”‚ Quality Scoreβ”‚

β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”‚

β–Ό

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚ DATA STORAGE LAYER β”‚

β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€

β”‚ Hot Store (Redis) β”‚ Warm Store (SQLite) β”‚ Cold (Parquet) β”‚

β”‚ Real-time prices β”‚ Recent history (7d) β”‚ Historical data β”‚

β”‚ TTL: 5 minutes β”‚ Indexed, queryable β”‚ Compressed, ML β”‚

β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”‚

β–Ό

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚ CONSUMPTION LAYER β”‚

β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€

β”‚ Trading Signals β”‚ ML Training β”‚ Backtesting β”‚ Dashboards β”‚

β”‚ (meme-trader) β”‚ (llama-analyst) β”‚ (meme-executor) β”‚ (Reports)β”‚

β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

```

Data Flow Configuration:

```typescript

interface PipelineConfig {

sources: DataSource[];

validationRules: ValidationRule[];

enrichmentSteps: EnrichmentFunction[];

storageTargets: StorageTarget[];

alertsEnabled: boolean;

qualityThreshold: number;

}

const defaultPipeline: PipelineConfig = {

sources: [

{ name: 'dexscreener', type: 'websocket', priority: 1 },

{ name: 'birdeye', type: 'rest', priority: 2 },

{ name: 'jupiter', type: 'rest', priority: 3 },

],

validationRules: ['schema', 'range', 'anomaly', 'freshness'],

enrichmentSteps: ['normalize', 'calculate_indicators', 'tag_quality'],

storageTargets: ['redis:hot', 'sqlite:warm'],

alertsEnabled: true,

qualityThreshold: 85,

};

```

4. ML/AI Integration Framework

Data Preparation for ML:

```typescript

interface MLReadyDataset {

features: {

price_data: TimeSeriesFeatures;

volume_data: TimeSeriesFeatures;

onchain_metrics: OnChainFeatures;

sentiment_scores: SentimentFeatures;

technical_indicators: TechnicalFeatures;

};

labels: {

price_direction: 'up' | 'down' | 'sideways';

price_magnitude: number;

optimal_action: 'buy' | 'sell' | 'hold';

};

metadata: {

timestamp: Date;

token: string;

quality_score: number;

source_count: number;

};

}

interface TimeSeriesFeatures {

values: number[];

timestamps: Date[];

normalized: number[]; // Z-score normalized

lagged: number[][]; // [lag_1, lag_5, lag_15, lag_60]

rolling_stats: {

mean_5: number[];

std_5: number[];

mean_15: number[];

std_15: number[];

};

}

```

Supported ML Techniques:

  1. Anomaly Detection: Isolation forests for unusual price/volume patterns
  2. NLP/Sentiment: BERT-based sentiment from social feeds
  3. Time Series: LSTM/Transformer for price prediction
  4. Classification: XGBoost for buy/sell signal classification
  5. Reinforcement Learning: DQN for optimal trade execution

Continuous Learning Pipeline:

```

New Data β†’ Feature Extraction β†’ Model Inference β†’ Signal Generation

↑

Performance Feedback

↓

Model Retraining (Weekly)

```

5. Backtesting Infrastructure

Historical Data Requirements:

```typescript

interface BacktestDataset {

token: string;

timeframe: '1m' | '5m' | '15m' | '1h' | '4h' | '1d';

start_date: Date;

end_date: Date;

data_points: {

timestamp: Date;

open: number;

high: number;

low: number;

close: number;

volume: number;

liquidity: number;

holders: number;

sentiment_score?: number;

}[];

quality_metrics: DataQualityMetrics;

}

```

Backtest Execution:

```bash

# Run backtest with historical data

npx tsx .claude/skills/data-orchestrator/scripts/backtest-runner.ts \

--strategy "momentum" \

--token "BONK" \

--start "2024-01-01" \

--end "2024-12-01" \

--initial-capital 1000 \

--slippage 0.01

```

Backtest Report Output:

```

BACKTEST REPORT: Momentum Strategy on BONK

Period: 2024-01-01 to 2024-12-01

PERFORMANCE:

  • Total Return: +234.5%
  • Sharpe Ratio: 1.87
  • Max Drawdown: -28.3%
  • Win Rate: 62.4%
  • Profit Factor: 2.15

TRADES:

  • Total Trades: 156
  • Avg Trade Duration: 4.2 hours
  • Best Trade: +45.2%
  • Worst Trade: -12.8%

DATA QUALITY:

  • Coverage: 99.2%
  • Missing Points: 847 / 105,120
  • Quality Score: 94/100

CAVEATS:

  • Historical results do not guarantee future performance
  • Slippage model: 1% (actual may vary)
  • Does not account for: MEV, extreme volatility periods

```

6. Risk & Portfolio Data Layer

Portfolio Metrics:

```typescript

interface PortfolioData {

positions: {

token: string;

entry_price: number;

current_price: number;

size: number;

unrealized_pnl: number;

allocation_pct: number;

risk_score: number;

}[];

aggregate: {

total_value: number;

total_pnl: number;

daily_var: number; // Value at Risk (95%)

beta_to_sol: number;

concentration_score: number; // Herfindahl index

};

limits: {

max_position_size: number;

max_daily_loss: number;

max_correlation: number;

stop_loss_pct: number;

};

}

```

Risk Data Sources:

  • Position tracking from meme-executor
  • Price volatility from historical data
  • Correlation matrix from cross-asset analysis
  • Liquidity depth from DEX APIs

Implementation Scripts

Data Pipeline Manager

```bash

# Start the data pipeline

npx tsx .claude/skills/data-orchestrator/scripts/pipeline-manager.ts \

--mode production \

--sources all \

--storage sqlite,redis

# Validate specific data source

npx tsx .claude/skills/data-orchestrator/scripts/validate-source.ts \

--source dexscreener \

--token "BONK" \

--verbose

# Generate ML-ready dataset

npx tsx .claude/skills/data-orchestrator/scripts/ml-dataset-builder.ts \

--token "BONK" \

--features "price,volume,sentiment" \

--lookback 30 \

--output ./datasets/bonk_ml_ready.parquet

```

Integration with Other Skills

Data Orchestrator provides to:

  • meme-trader: Validated price/volume data, quality scores
  • llama-analyst: DeFi protocol metrics, TVL/revenue time series
  • meme-executor: Real-time execution prices, slippage estimates
  • flow-tracker: On-chain flow data, whale movements
  • degen-savant: Sentiment aggregates, social momentum

Data Orchestrator receives from:

  • All skills: Data quality feedback, missing data requests
  • meme-executor: Trade execution data for backtest validation

Quality Gates

  • All data must have quality score >= 80% for trading signals
  • Price data staleness: max 30 seconds for live trading
  • Historical data completeness: min 95% for backtesting
  • Cross-source agreement: min 2 sources for critical decisions
  • Schema validation: 100% compliance required
  • Anomaly flagging: auto-reject data points > 5 sigma

Error Handling

  • Source unavailable: Failover to backup source within 5 seconds
  • Quality below threshold: Alert + fallback to last good data
  • Schema mismatch: Log error, use default values, alert
  • Rate limit hit: Exponential backoff, rotate API keys
  • Network timeout: Retry 3x with increasing delay

Compliance & Security

  • Encrypt API keys at rest and in transit
  • Log all data access for audit trails
  • Implement IP whitelisting for production
  • GDPR-compliant: no PII in trading data
  • Rate limit all external calls (prevent API bans)
  • Regular security audits of data pipeline

Performance Targets

| Metric | Target | Measurement |

|--------|--------|-------------|

| Data latency (real-time) | < 500ms | WebSocket to storage |

| Validation throughput | > 10K records/sec | Validation pipeline |

| Quality score accuracy | > 95% | vs manual audit |

| Backtest data coverage | > 99% | Historical completeness |

| ML feature freshness | < 5 min | Feature store update |

  • references/data-sources.md - Complete API documentation
  • references/quality-standards.md - Validation rule definitions
  • references/ml-feature-catalog.md - Available ML features
  • scripts/pipeline-manager.ts - Main orchestration script
  • scripts/quality-monitor.ts - Continuous quality monitoring
  • scripts/backtest-runner.ts - Historical backtesting