data-orchestrator
π―Skillfrom dreamineering/meme-times
Orchestrates comprehensive data pipelines for AI-driven crypto trading, integrating multi-source data validation, governance, and real-time aggregation.
Installation
npx skills add https://github.com/dreamineering/meme-times --skill data-orchestratorSkill Details
|
Overview
# Data Orchestrator - AI Trading Data Strategy Layer
Central nervous system for all data operations across the meme-times ecosystem. Implements a robust, AI-ready data strategy that precedes and enables all trading decisions.
Core Principle
> Data strategy comes BEFORE AI. Clean governance, validated sources, and secure infrastructure enable useful trading insights.
Activation Triggers
- "Build a data pipeline for [use case]"
- "Validate data quality for [source]"
- "Set up backtesting infrastructure"
- "Aggregate data from multiple sources"
- "Data governance for [trading strategy]"
- "Real-time data feed for [token/market]"
- Keywords: data pipeline, data quality, validation, aggregation, backtesting, ML training, historical data, real-time feed, data governance
Data Strategy Pillars
1. Diverse Data Sources
Price & Market Data:
| Source | Data Type | Latency | Quality | Cost |
|--------|-----------|---------|---------|------|
| Dexscreener API | DEX prices, pools | Real-time | High | Free tier |
| Birdeye API | Solana tokens, analytics | Real-time | High | Paid |
| Jupiter Price API | Solana swap prices | Real-time | High | Free |
| CoinGecko | Cross-chain prices | 1-5 min | High | Free tier |
| CoinAPI | Historical + streaming | Configurable | Premium | Paid |
On-Chain Data:
| Source | Data Type | Latency | Quality | Cost |
|--------|-----------|---------|---------|------|
| Helius RPC | Solana transactions | Real-time | High | Freemium |
| Solscan API | Token info, holders | Near real-time | High | Free tier |
| Dune Analytics | SQL queries | Minutes-hours | High | Freemium |
| Flipside Crypto | Pre-built datasets | Hours | High | Free |
Sentiment & Social:
| Source | Data Type | Latency | Quality | Cost |
|--------|-----------|---------|---------|------|
| Twitter/X API | Social mentions | Real-time | Medium | Paid |
| LunarCrush | Social metrics | Near real-time | High | Paid |
| Telegram scraping | Community sentiment | Real-time | Low | DIY |
| Reddit API | Discussion sentiment | Minutes | Medium | Free |
DeFi Protocol Data:
| Source | Data Type | Latency | Quality | Cost |
|--------|-----------|---------|---------|------|
| DefiLlama API | TVL, revenue, yields | 15 min | High | Free |
| Token Terminal | Revenue, P/E ratios | Daily | High | Paid |
| DeFi Pulse | TVL rankings | Hourly | Medium | Free |
2. Data Quality & Governance
Quality Dimensions:
```typescript
interface DataQualityMetrics {
accuracy: number; // 0-100: correctness vs ground truth
completeness: number; // 0-100: missing values ratio
timeliness: number; // 0-100: freshness score
consistency: number; // 0-100: cross-source agreement
validity: number; // 0-100: schema conformance
}
interface QualityThresholds {
trading_signals: { min: 90, critical: 'timeliness' };
historical_analysis: { min: 85, critical: 'completeness' };
sentiment_analysis: { min: 70, critical: 'consistency' };
backtesting: { min: 95, critical: 'accuracy' };
}
```
Validation Pipeline:
```
Raw Data β Schema Validation β Anomaly Detection β Cross-Source Check β Quality Score β Accept/Reject
```
Validation Rules:
- Schema Validation: All data must match expected types/formats
- Range Checks: Prices, volumes, percentages within valid bounds
- Anomaly Detection: Flag outliers > 3 standard deviations
- Cross-Source Verification: Compare with 2+ sources for critical data
- Freshness Enforcement: Reject stale data beyond threshold
Automated Quality Monitoring:
```bash
# Run continuous quality checks
npx tsx .claude/skills/data-orchestrator/scripts/quality-monitor.ts \
--sources "dexscreener,birdeye,jupiter" \
--interval 60 \
--alert-threshold 80
```
3. Real-Time Data Pipeline Architecture
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA INGESTION LAYER β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β WebSocket Feeds β REST Polling β RPC Subscriptions β
β (Dexscreener, WS) β (Coingecko) β (Helius, Solana) β
ββββββββββββ¬βββββββββββ΄βββββββββ¬ββββββββββ΄βββββββββββ¬βββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β VALIDATION & ENRICHMENT β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Schema Check β Anomaly Flag β Cross-Verify β Quality Scoreβ
ββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA STORAGE LAYER β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Hot Store (Redis) β Warm Store (SQLite) β Cold (Parquet) β
β Real-time prices β Recent history (7d) β Historical data β
β TTL: 5 minutes β Indexed, queryable β Compressed, ML β
ββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CONSUMPTION LAYER β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Trading Signals β ML Training β Backtesting β Dashboards β
β (meme-trader) β (llama-analyst) β (meme-executor) β (Reports)β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
Data Flow Configuration:
```typescript
interface PipelineConfig {
sources: DataSource[];
validationRules: ValidationRule[];
enrichmentSteps: EnrichmentFunction[];
storageTargets: StorageTarget[];
alertsEnabled: boolean;
qualityThreshold: number;
}
const defaultPipeline: PipelineConfig = {
sources: [
{ name: 'dexscreener', type: 'websocket', priority: 1 },
{ name: 'birdeye', type: 'rest', priority: 2 },
{ name: 'jupiter', type: 'rest', priority: 3 },
],
validationRules: ['schema', 'range', 'anomaly', 'freshness'],
enrichmentSteps: ['normalize', 'calculate_indicators', 'tag_quality'],
storageTargets: ['redis:hot', 'sqlite:warm'],
alertsEnabled: true,
qualityThreshold: 85,
};
```
4. ML/AI Integration Framework
Data Preparation for ML:
```typescript
interface MLReadyDataset {
features: {
price_data: TimeSeriesFeatures;
volume_data: TimeSeriesFeatures;
onchain_metrics: OnChainFeatures;
sentiment_scores: SentimentFeatures;
technical_indicators: TechnicalFeatures;
};
labels: {
price_direction: 'up' | 'down' | 'sideways';
price_magnitude: number;
optimal_action: 'buy' | 'sell' | 'hold';
};
metadata: {
timestamp: Date;
token: string;
quality_score: number;
source_count: number;
};
}
interface TimeSeriesFeatures {
values: number[];
timestamps: Date[];
normalized: number[]; // Z-score normalized
lagged: number[][]; // [lag_1, lag_5, lag_15, lag_60]
rolling_stats: {
mean_5: number[];
std_5: number[];
mean_15: number[];
std_15: number[];
};
}
```
Supported ML Techniques:
- Anomaly Detection: Isolation forests for unusual price/volume patterns
- NLP/Sentiment: BERT-based sentiment from social feeds
- Time Series: LSTM/Transformer for price prediction
- Classification: XGBoost for buy/sell signal classification
- Reinforcement Learning: DQN for optimal trade execution
Continuous Learning Pipeline:
```
New Data β Feature Extraction β Model Inference β Signal Generation
β
Performance Feedback
β
Model Retraining (Weekly)
```
5. Backtesting Infrastructure
Historical Data Requirements:
```typescript
interface BacktestDataset {
token: string;
timeframe: '1m' | '5m' | '15m' | '1h' | '4h' | '1d';
start_date: Date;
end_date: Date;
data_points: {
timestamp: Date;
open: number;
high: number;
low: number;
close: number;
volume: number;
liquidity: number;
holders: number;
sentiment_score?: number;
}[];
quality_metrics: DataQualityMetrics;
}
```
Backtest Execution:
```bash
# Run backtest with historical data
npx tsx .claude/skills/data-orchestrator/scripts/backtest-runner.ts \
--strategy "momentum" \
--token "BONK" \
--start "2024-01-01" \
--end "2024-12-01" \
--initial-capital 1000 \
--slippage 0.01
```
Backtest Report Output:
```
BACKTEST REPORT: Momentum Strategy on BONK
Period: 2024-01-01 to 2024-12-01
PERFORMANCE:
- Total Return: +234.5%
- Sharpe Ratio: 1.87
- Max Drawdown: -28.3%
- Win Rate: 62.4%
- Profit Factor: 2.15
TRADES:
- Total Trades: 156
- Avg Trade Duration: 4.2 hours
- Best Trade: +45.2%
- Worst Trade: -12.8%
DATA QUALITY:
- Coverage: 99.2%
- Missing Points: 847 / 105,120
- Quality Score: 94/100
CAVEATS:
- Historical results do not guarantee future performance
- Slippage model: 1% (actual may vary)
- Does not account for: MEV, extreme volatility periods
```
6. Risk & Portfolio Data Layer
Portfolio Metrics:
```typescript
interface PortfolioData {
positions: {
token: string;
entry_price: number;
current_price: number;
size: number;
unrealized_pnl: number;
allocation_pct: number;
risk_score: number;
}[];
aggregate: {
total_value: number;
total_pnl: number;
daily_var: number; // Value at Risk (95%)
beta_to_sol: number;
concentration_score: number; // Herfindahl index
};
limits: {
max_position_size: number;
max_daily_loss: number;
max_correlation: number;
stop_loss_pct: number;
};
}
```
Risk Data Sources:
- Position tracking from meme-executor
- Price volatility from historical data
- Correlation matrix from cross-asset analysis
- Liquidity depth from DEX APIs
Implementation Scripts
Data Pipeline Manager
```bash
# Start the data pipeline
npx tsx .claude/skills/data-orchestrator/scripts/pipeline-manager.ts \
--mode production \
--sources all \
--storage sqlite,redis
# Validate specific data source
npx tsx .claude/skills/data-orchestrator/scripts/validate-source.ts \
--source dexscreener \
--token "BONK" \
--verbose
# Generate ML-ready dataset
npx tsx .claude/skills/data-orchestrator/scripts/ml-dataset-builder.ts \
--token "BONK" \
--features "price,volume,sentiment" \
--lookback 30 \
--output ./datasets/bonk_ml_ready.parquet
```
Integration with Other Skills
Data Orchestrator provides to:
- meme-trader: Validated price/volume data, quality scores
- llama-analyst: DeFi protocol metrics, TVL/revenue time series
- meme-executor: Real-time execution prices, slippage estimates
- flow-tracker: On-chain flow data, whale movements
- degen-savant: Sentiment aggregates, social momentum
Data Orchestrator receives from:
- All skills: Data quality feedback, missing data requests
- meme-executor: Trade execution data for backtest validation
Quality Gates
- All data must have quality score >= 80% for trading signals
- Price data staleness: max 30 seconds for live trading
- Historical data completeness: min 95% for backtesting
- Cross-source agreement: min 2 sources for critical decisions
- Schema validation: 100% compliance required
- Anomaly flagging: auto-reject data points > 5 sigma
Error Handling
- Source unavailable: Failover to backup source within 5 seconds
- Quality below threshold: Alert + fallback to last good data
- Schema mismatch: Log error, use default values, alert
- Rate limit hit: Exponential backoff, rotate API keys
- Network timeout: Retry 3x with increasing delay
Compliance & Security
- Encrypt API keys at rest and in transit
- Log all data access for audit trails
- Implement IP whitelisting for production
- GDPR-compliant: no PII in trading data
- Rate limit all external calls (prevent API bans)
- Regular security audits of data pipeline
Performance Targets
| Metric | Target | Measurement |
|--------|--------|-------------|
| Data latency (real-time) | < 500ms | WebSocket to storage |
| Validation throughput | > 10K records/sec | Validation pipeline |
| Quality score accuracy | > 95% | vs manual audit |
| Backtest data coverage | > 99% | Historical completeness |
| ML feature freshness | < 5 min | Feature store update |
- references/data-sources.md - Complete API documentation
- references/quality-standards.md - Validation rule definitions
- references/ml-feature-catalog.md - Available ML features
- scripts/pipeline-manager.ts - Main orchestration script
- scripts/quality-monitor.ts - Continuous quality monitoring
- scripts/backtest-runner.ts - Historical backtesting
More from this repository8
meme-launcher skill from dreamineering/meme-times
Analyzes DeFi protocols and crypto markets using data from DefiLlama, identifying undervalued tokens, screening by fundamentals, and comparing cross-chain performance.
Generates viral meme content and distribution strategies to pump memecoin narratives and drive community engagement across social platforms.
depin-infrastructure-fetcher skill from dreamineering/meme-times
Optimizes portfolio risk by dynamically sizing positions, calculating risk metrics, and implementing adaptive allocation strategies based on market conditions.
Builds and fortifies crypto communities by transforming holders into passionate, loyal believers through psychological engineering and strategic engagement mechanics.
Analyzes and executes Solana memecoin trades by detecting opportunities, assessing risks, and generating alpha across pump.fun, Raydium, and Jupiter.
Executes Solana memecoin trades by simulating or routing orders via wallet, transforming high-level trade plans into concrete actions.