🎯

dspy

🎯Skill

from ovachiever/droid-tings

VibeIndex|
What it does

Systematically build, optimize, and modularize AI systems using Stanford NLP's declarative programming framework for language models.

πŸ“¦

Part of

ovachiever/droid-tings(370 items)

dspy

Installation

pip installInstall Python package
pip install dspy
pip installInstall Python package
pip install git+https://github.com/stanfordnlp/dspy.git
pip installInstall Python package
pip install dspy[openai] # OpenAI
pip installInstall Python package
pip install dspy[anthropic] # Anthropic Claude
pip installInstall Python package
pip install dspy[all] # All providers
πŸ“– Extracted from docs: ovachiever/droid-tings
17Installs
-
AddedFeb 4, 2026

Skill Details

SKILL.md

Build complex AI systems with declarative programming, optimize prompts automatically, create modular RAG systems and agents with DSPy - Stanford NLP's framework for systematic LM programming

Overview

# DSPy: Declarative Language Model Programming

When to Use This Skill

Use DSPy when you need to:

  • Build complex AI systems with multiple components and workflows
  • Program LMs declaratively instead of manual prompt engineering
  • Optimize prompts automatically using data-driven methods
  • Create modular AI pipelines that are maintainable and portable
  • Improve model outputs systematically with optimizers
  • Build RAG systems, agents, or classifiers with better reliability

GitHub Stars: 22,000+ | Created By: Stanford NLP

Installation

```bash

# Stable release

pip install dspy

# Latest development version

pip install git+https://github.com/stanfordnlp/dspy.git

# With specific LM providers

pip install dspy[openai] # OpenAI

pip install dspy[anthropic] # Anthropic Claude

pip install dspy[all] # All providers

```

Quick Start

Basic Example: Question Answering

```python

import dspy

# Configure your language model

lm = dspy.Claude(model="claude-sonnet-4-5-20250929")

dspy.settings.configure(lm=lm)

# Define a signature (input β†’ output)

class QA(dspy.Signature):

"""Answer questions with short factual answers."""

question = dspy.InputField()

answer = dspy.OutputField(desc="often between 1 and 5 words")

# Create a module

qa = dspy.Predict(QA)

# Use it

response = qa(question="What is the capital of France?")

print(response.answer) # "Paris"

```

Chain of Thought Reasoning

```python

import dspy

lm = dspy.Claude(model="claude-sonnet-4-5-20250929")

dspy.settings.configure(lm=lm)

# Use ChainOfThought for better reasoning

class MathProblem(dspy.Signature):

"""Solve math word problems."""

problem = dspy.InputField()

answer = dspy.OutputField(desc="numerical answer")

# ChainOfThought generates reasoning steps automatically

cot = dspy.ChainOfThought(MathProblem)

response = cot(problem="If John has 5 apples and gives 2 to Mary, how many does he have?")

print(response.rationale) # Shows reasoning steps

print(response.answer) # "3"

```

Core Concepts

1. Signatures

Signatures define the structure of your AI task (inputs β†’ outputs):

```python

# Inline signature (simple)

qa = dspy.Predict("question -> answer")

# Class signature (detailed)

class Summarize(dspy.Signature):

"""Summarize text into key points."""

text = dspy.InputField()

summary = dspy.OutputField(desc="bullet points, 3-5 items")

summarizer = dspy.ChainOfThought(Summarize)

```

When to use each:

  • Inline: Quick prototyping, simple tasks
  • Class: Complex tasks, type hints, better documentation

2. Modules

Modules are reusable components that transform inputs to outputs:

#### dspy.Predict

Basic prediction module:

```python

predictor = dspy.Predict("context, question -> answer")

result = predictor(context="Paris is the capital of France",

question="What is the capital?")

```

#### dspy.ChainOfThought

Generates reasoning steps before answering:

```python

cot = dspy.ChainOfThought("question -> answer")

result = cot(question="Why is the sky blue?")

print(result.rationale) # Reasoning steps

print(result.answer) # Final answer

```

#### dspy.ReAct

Agent-like reasoning with tools:

```python

from dspy.predict import ReAct

class SearchQA(dspy.Signature):

"""Answer questions using search."""

question = dspy.InputField()

answer = dspy.OutputField()

def search_tool(query: str) -> str:

"""Search Wikipedia."""

# Your search implementation

return results

react = ReAct(SearchQA, tools=[search_tool])

result = react(question="When was Python created?")

```

#### dspy.ProgramOfThought

Generates and executes code for reasoning:

```python

pot = dspy.ProgramOfThought("question -> answer")

result = pot(question="What is 15% of 240?")

# Generates: answer = 240 * 0.15

```

3. Optimizers

Optimizers improve your modules automatically using training data:

#### BootstrapFewShot

Learns from examples:

```python

from dspy.teleprompt import BootstrapFewShot

# Training data

trainset = [

dspy.Example(question="What is 2+2?", answer="4").with_inputs("question"),

dspy.Example(question="What is 3+5?", answer="8").with_inputs("question"),

]

# Define metric

def validate_answer(example, pred, trace=None):

return example.answer == pred.answer

# Optimize

optimizer = BootstrapFewShot(metric=validate_answer, max_bootstrapped_demos=3)

optimized_qa = optimizer.compile(qa, trainset=trainset)

# Now optimized_qa performs better!

```

#### MIPRO (Most Important Prompt Optimization)

Iteratively improves prompts:

```python

from dspy.teleprompt import MIPRO

optimizer = MIPRO(

metric=validate_answer,

num_candidates=10,

init_temperature=1.0

)

optimized_cot = optimizer.compile(

cot,

trainset=trainset,

num_trials=100

)

```

#### BootstrapFinetune

Creates datasets for model fine-tuning:

```python

from dspy.teleprompt import BootstrapFinetune

optimizer = BootstrapFinetune(metric=validate_answer)

optimized_module = optimizer.compile(qa, trainset=trainset)

# Exports training data for fine-tuning

```

4. Building Complex Systems

#### Multi-Stage Pipeline

```python

import dspy

class MultiHopQA(dspy.Module):

def __init__(self):

super().__init__()

self.retrieve = dspy.Retrieve(k=3)

self.generate_query = dspy.ChainOfThought("question -> search_query")

self.generate_answer = dspy.ChainOfThought("context, question -> answer")

def forward(self, question):

# Stage 1: Generate search query

search_query = self.generate_query(question=question).search_query

# Stage 2: Retrieve context

passages = self.retrieve(search_query).passages

context = "\n".join(passages)

# Stage 3: Generate answer

answer = self.generate_answer(context=context, question=question).answer

return dspy.Prediction(answer=answer, context=context)

# Use the pipeline

qa_system = MultiHopQA()

result = qa_system(question="Who wrote the book that inspired the movie Blade Runner?")

```

#### RAG System with Optimization

```python

import dspy

from dspy.retrieve.chromadb_rm import ChromadbRM

# Configure retriever

retriever = ChromadbRM(

collection_name="documents",

persist_directory="./chroma_db"

)

class RAG(dspy.Module):

def __init__(self, num_passages=3):

super().__init__()

self.retrieve = dspy.Retrieve(k=num_passages)

self.generate = dspy.ChainOfThought("context, question -> answer")

def forward(self, question):

context = self.retrieve(question).passages

return self.generate(context=context, question=question)

# Create and optimize

rag = RAG()

# Optimize with training data

from dspy.teleprompt import BootstrapFewShot

optimizer = BootstrapFewShot(metric=validate_answer)

optimized_rag = optimizer.compile(rag, trainset=trainset)

```

LM Provider Configuration

Anthropic Claude

```python

import dspy

lm = dspy.Claude(

model="claude-sonnet-4-5-20250929",

api_key="your-api-key", # Or set ANTHROPIC_API_KEY env var

max_tokens=1000,

temperature=0.7

)

dspy.settings.configure(lm=lm)

```

OpenAI

```python

lm = dspy.OpenAI(

model="gpt-4",

api_key="your-api-key",

max_tokens=1000

)

dspy.settings.configure(lm=lm)

```

Local Models (Ollama)

```python

lm = dspy.OllamaLocal(

model="llama3.1",

base_url="http://localhost:11434"

)

dspy.settings.configure(lm=lm)

```

Multiple Models

```python

# Different models for different tasks

cheap_lm = dspy.OpenAI(model="gpt-3.5-turbo")

strong_lm = dspy.Claude(model="claude-sonnet-4-5-20250929")

# Use cheap model for retrieval, strong model for reasoning

with dspy.settings.context(lm=cheap_lm):

context = retriever(question)

with dspy.settings.context(lm=strong_lm):

answer = generator(context=context, question=question)

```

Common Patterns

Pattern 1: Structured Output

```python

from pydantic import BaseModel, Field

class PersonInfo(BaseModel):

name: str = Field(description="Full name")

age: int = Field(description="Age in years")

occupation: str = Field(description="Current job")

class ExtractPerson(dspy.Signature):

"""Extract person information from text."""

text = dspy.InputField()

person: PersonInfo = dspy.OutputField()

extractor = dspy.TypedPredictor(ExtractPerson)

result = extractor(text="John Doe is a 35-year-old software engineer.")

print(result.person.name) # "John Doe"

print(result.person.age) # 35

```

Pattern 2: Assertion-Driven Optimization

```python

import dspy

from dspy.primitives.assertions import assert_transform_module, backtrack_handler

class MathQA(dspy.Module):

def __init__(self):

super().__init__()

self.solve = dspy.ChainOfThought("problem -> solution: float")

def forward(self, problem):

solution = self.solve(problem=problem).solution

# Assert solution is numeric

dspy.Assert(

isinstance(float(solution), float),

"Solution must be a number",

backtrack=backtrack_handler

)

return dspy.Prediction(solution=solution)

```

Pattern 3: Self-Consistency

```python

import dspy

from collections import Counter

class ConsistentQA(dspy.Module):

def __init__(self, num_samples=5):

super().__init__()

self.qa = dspy.ChainOfThought("question -> answer")

self.num_samples = num_samples

def forward(self, question):

# Generate multiple answers

answers = []

for _ in range(self.num_samples):

result = self.qa(question=question)

answers.append(result.answer)

# Return most common answer

most_common = Counter(answers).most_common(1)[0][0]

return dspy.Prediction(answer=most_common)

```

Pattern 4: Retrieval with Reranking

```python

class RerankedRAG(dspy.Module):

def __init__(self):

super().__init__()

self.retrieve = dspy.Retrieve(k=10)

self.rerank = dspy.Predict("question, passage -> relevance_score: float")

self.answer = dspy.ChainOfThought("context, question -> answer")

def forward(self, question):

# Retrieve candidates

passages = self.retrieve(question).passages

# Rerank passages

scored = []

for passage in passages:

score = float(self.rerank(question=question, passage=passage).relevance_score)

scored.append((score, passage))

# Take top 3

top_passages = [p for _, p in sorted(scored, reverse=True)[:3]]

context = "\n\n".join(top_passages)

# Generate answer

return self.answer(context=context, question=question)

```

Evaluation and Metrics

Custom Metrics

```python

def exact_match(example, pred, trace=None):

"""Exact match metric."""

return example.answer.lower() == pred.answer.lower()

def f1_score(example, pred, trace=None):

"""F1 score for text overlap."""

pred_tokens = set(pred.answer.lower().split())

gold_tokens = set(example.answer.lower().split())

if not pred_tokens:

return 0.0

precision = len(pred_tokens & gold_tokens) / len(pred_tokens)

recall = len(pred_tokens & gold_tokens) / len(gold_tokens)

if precision + recall == 0:

return 0.0

return 2 (precision recall) / (precision + recall)

```

Evaluation

```python

from dspy.evaluate import Evaluate

# Create evaluator

evaluator = Evaluate(

devset=testset,

metric=exact_match,

num_threads=4,

display_progress=True

)

# Evaluate model

score = evaluator(qa_system)

print(f"Accuracy: {score}")

# Compare optimized vs unoptimized

score_before = evaluator(qa)

score_after = evaluator(optimized_qa)

print(f"Improvement: {score_after - score_before:.2%}")

```

Best Practices

1. Start Simple, Iterate

```python

# Start with Predict

qa = dspy.Predict("question -> answer")

# Add reasoning if needed

qa = dspy.ChainOfThought("question -> answer")

# Add optimization when you have data

optimized_qa = optimizer.compile(qa, trainset=data)

```

2. Use Descriptive Signatures

```python

# ❌ Bad: Vague

class Task(dspy.Signature):

input = dspy.InputField()

output = dspy.OutputField()

# βœ… Good: Descriptive

class SummarizeArticle(dspy.Signature):

"""Summarize news articles into 3-5 key points."""

article = dspy.InputField(desc="full article text")

summary = dspy.OutputField(desc="bullet points, 3-5 items")

```

3. Optimize with Representative Data

```python

# Create diverse training examples

trainset = [

dspy.Example(question="factual", answer="...).with_inputs("question"),

dspy.Example(question="reasoning", answer="...").with_inputs("question"),

dspy.Example(question="calculation", answer="...").with_inputs("question"),

]

# Use validation set for metric

def metric(example, pred, trace=None):

return example.answer in pred.answer

```

4. Save and Load Optimized Models

```python

# Save

optimized_qa.save("models/qa_v1.json")

# Load

loaded_qa = dspy.ChainOfThought("question -> answer")

loaded_qa.load("models/qa_v1.json")

```

5. Monitor and Debug

```python

# Enable tracing

dspy.settings.configure(lm=lm, trace=[])

# Run prediction

result = qa(question="...")

# Inspect trace

for call in dspy.settings.trace:

print(f"Prompt: {call['prompt']}")

print(f"Response: {call['response']}")

```

Comparison to Other Approaches

| Feature | Manual Prompting | LangChain | DSPy |

|---------|-----------------|-----------|------|

| Prompt Engineering | Manual | Manual | Automatic |

| Optimization | Trial & error | None | Data-driven |

| Modularity | Low | Medium | High |

| Type Safety | No | Limited | Yes (Signatures) |

| Portability | Low | Medium | High |

| Learning Curve | Low | Medium | Medium-High |

When to choose DSPy:

  • You have training data or can generate it
  • You need systematic prompt improvement
  • You're building complex multi-stage systems
  • You want to optimize across different LMs

When to choose alternatives:

  • Quick prototypes (manual prompting)
  • Simple chains with existing tools (LangChain)
  • Custom optimization logic needed

Resources

  • Documentation: https://dspy.ai
  • GitHub: https://github.com/stanfordnlp/dspy (22k+ stars)
  • Discord: https://discord.gg/XCGy2WDCQB
  • Twitter: @DSPyOSS
  • Paper: "DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines"

See Also

  • references/modules.md - Detailed module guide (Predict, ChainOfThought, ReAct, ProgramOfThought)
  • references/optimizers.md - Optimization algorithms (BootstrapFewShot, MIPRO, BootstrapFinetune)
  • references/examples.md - Real-world examples (RAG, agents, classifiers)