🎯

instructor

🎯Skill

from ovachiever/droid-tings

What it does

Extracts and validates structured data from LLM responses using Pydantic schemas, supporting multiple providers with automatic retries and type safety.

📦

Part of

ovachiever/droid-tings(370 items)

instructor

Installation

pip installInstall Python package

pip install instructor

pip installInstall Python package

pip install "instructor[anthropic]" # Anthropic Claude

pip installInstall Python package

pip install "instructor[openai]" # OpenAI

pip installInstall Python package

pip install "instructor[all]" # All providers

📖 Extracted from docs: ovachiever/droid-tings

Need more details? View full documentation on GitHub →

16Installs

AddedFeb 4, 2026

View on GitHub Back to Skills

Skill Details

SKILL.md

Extract structured data from LLM responses with Pydantic validation, retry failed extractions automatically, parse complex JSON with type safety, and stream partial results with Instructor - battle-tested structured output library

Overview

# Instructor: Structured LLM Outputs

When to Use This Skill

Use Instructor when you need to:

Extract structured data from LLM responses reliably
Validate outputs against Pydantic schemas automatically
Retry failed extractions with automatic error handling
Parse complex JSON with type safety and validation
Stream partial results for real-time processing
Support multiple LLM providers with consistent API

GitHub Stars: 15,000+ | Battle-tested: 100,000+ developers

Installation

```bash

# Base installation

pip install instructor

# With specific providers

pip install "instructor[anthropic]" # Anthropic Claude

pip install "instructor[openai]" # OpenAI

pip install "instructor[all]" # All providers

```

Quick Start

Basic Example: Extract User Data

```python

import instructor

from pydantic import BaseModel

from anthropic import Anthropic

# Define output structure

class User(BaseModel):

age: int

email: str

# Create instructor client

client = instructor.from_anthropic(Anthropic())

# Extract structured data

user = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": "John Doe is 30 years old. His email is john@example.com"

}],

response_model=User

)

print(user.name) # "John Doe"

print(user.age) # 30

print(user.email) # "john@example.com"

```

With OpenAI

```python

from openai import OpenAI

client = instructor.from_openai(OpenAI())

user = client.chat.completions.create(

model="gpt-4o-mini",

response_model=User,

messages=[{"role": "user", "content": "Extract: Alice, 25, alice@email.com"}]

)

```

Core Concepts

1. Response Models (Pydantic)

Response models define the structure and validation rules for LLM outputs.

#### Basic Model

```python

from pydantic import BaseModel, Field

class Article(BaseModel):

title: str = Field(description="Article title")

author: str = Field(description="Author name")

word_count: int = Field(description="Number of words", gt=0)

tags: list[str] = Field(description="List of relevant tags")

article = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": "Analyze this article: [article text]"

}],

response_model=Article

)

```

Benefits:

Type safety with Python type hints
Automatic validation (word_count > 0)
Self-documenting with Field descriptions
IDE autocomplete support

#### Nested Models

```python

class Address(BaseModel):

street: str

city: str

country: str

class Person(BaseModel):

age: int

address: Address # Nested model

person = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": "John lives at 123 Main St, Boston, USA"

}],

response_model=Person

)

print(person.address.city) # "Boston"

```

#### Optional Fields

```python

from typing import Optional

class Product(BaseModel):

price: float

discount: Optional[float] = None # Optional

description: str = Field(default="No description") # Default value

# LLM doesn't need to provide discount or description

```

#### Enums for Constraints

```python

from enum import Enum

class Sentiment(str, Enum):

POSITIVE = "positive"

NEGATIVE = "negative"

NEUTRAL = "neutral"

class Review(BaseModel):

text: str

sentiment: Sentiment # Only these 3 values allowed

review = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": "This product is amazing!"

}],

response_model=Review

)

print(review.sentiment) # Sentiment.POSITIVE

```

2. Validation

Pydantic validates LLM outputs automatically. If validation fails, Instructor retries.

#### Built-in Validators

```python

from pydantic import Field, EmailStr, HttpUrl

class Contact(BaseModel):

age: int = Field(ge=0, le=120) # 0 <= age <= 120

email: EmailStr # Validates email format

website: HttpUrl # Validates URL format

# If LLM provides invalid data, Instructor retries automatically

```

#### Custom Validators

```python

from pydantic import field_validator

class Event(BaseModel):

date: str

attendees: int

@field_validator('date')

def validate_date(cls, v):

"""Ensure date is in YYYY-MM-DD format."""

import re

if not re.match(r'\d{4}-\d{2}-\d{2}', v):

raise ValueError('Date must be YYYY-MM-DD format')

return v

@field_validator('attendees')

def validate_attendees(cls, v):

"""Ensure positive attendees."""

if v < 1:

raise ValueError('Must have at least 1 attendee')

return v

```

#### Model-Level Validation

```python

from pydantic import model_validator

class DateRange(BaseModel):

start_date: str

end_date: str

@model_validator(mode='after')

def check_dates(self):

"""Ensure end_date is after start_date."""

from datetime import datetime

start = datetime.strptime(self.start_date, '%Y-%m-%d')

end = datetime.strptime(self.end_date, '%Y-%m-%d')

if end < start:

raise ValueError('end_date must be after start_date')

return self

```

3. Automatic Retrying

Instructor retries automatically when validation fails, providing error feedback to the LLM.

```python

# Retries up to 3 times if validation fails

user = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": "Extract user from: John, age unknown"

}],

response_model=User,

max_retries=3 # Default is 3

)

# If age can't be extracted, Instructor tells the LLM:

# "Validation error: age - field required"

# LLM tries again with better extraction

```

How it works:

LLM generates output
Pydantic validates
If invalid: Error message sent back to LLM
LLM tries again with error feedback
Repeats up to max_retries

4. Streaming

Stream partial results for real-time processing.

#### Streaming Partial Objects

```python

from instructor import Partial

class Story(BaseModel):

title: str

content: str

tags: list[str]

# Stream partial updates as LLM generates

for partial_story in client.messages.create_partial(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": "Write a short sci-fi story"

}],

response_model=Story

print(f"Title: {partial_story.title}")

print(f"Content so far: {partial_story.content[:100]}...")

# Update UI in real-time

```

#### Streaming Iterables

```python

class Task(BaseModel):

title: str

priority: str

# Stream list items as they're generated

tasks = client.messages.create_iterable(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": "Generate 10 project tasks"

}],

response_model=Task

)

for task in tasks:

print(f"- {task.title} ({task.priority})")

# Process each task as it arrives

```

Provider Configuration

Anthropic Claude

```python

import instructor

from anthropic import Anthropic

client = instructor.from_anthropic(

Anthropic(api_key="your-api-key")

)

# Use with Claude models

response = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[...],

response_model=YourModel

)

```

OpenAI

```python

from openai import OpenAI

client = instructor.from_openai(

OpenAI(api_key="your-api-key")

)

response = client.chat.completions.create(

model="gpt-4o-mini",

response_model=YourModel,

messages=[...]

)

```

Local Models (Ollama)

```python

from openai import OpenAI

# Point to local Ollama server

client = instructor.from_openai(

OpenAI(

base_url="http://localhost:11434/v1",

api_key="ollama" # Required but ignored

mode=instructor.Mode.JSON

)

response = client.chat.completions.create(

model="llama3.1",

response_model=YourModel,

messages=[...]

)

```

Common Patterns

Pattern 1: Data Extraction from Text

```python

class CompanyInfo(BaseModel):

founded_year: int

industry: str

employees: int

headquarters: str

text = """

Tesla, Inc. was founded in 2003. It operates in the automotive and energy

industry with approximately 140,000 employees. The company is headquartered

in Austin, Texas.

"""

company = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": f"Extract company information from: {text}"

}],

response_model=CompanyInfo

)

```

Pattern 2: Classification

```python

class Category(str, Enum):

TECHNOLOGY = "technology"

FINANCE = "finance"

HEALTHCARE = "healthcare"

EDUCATION = "education"

OTHER = "other"

class ArticleClassification(BaseModel):

category: Category

confidence: float = Field(ge=0.0, le=1.0)

keywords: list[str]

classification = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": "Classify this article: [article text]"

}],

response_model=ArticleClassification

)

```

Pattern 3: Multi-Entity Extraction

```python

class Person(BaseModel):

role: str

class Organization(BaseModel):

industry: str

class Entities(BaseModel):

people: list[Person]

organizations: list[Organization]

locations: list[str]

text = "Tim Cook, CEO of Apple, announced at the event in Cupertino..."

entities = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": f"Extract all entities from: {text}"

}],

response_model=Entities

)

for person in entities.people:

print(f"{person.name} - {person.role}")

```

Pattern 4: Structured Analysis

```python

class SentimentAnalysis(BaseModel):

overall_sentiment: Sentiment

positive_aspects: list[str]

negative_aspects: list[str]

suggestions: list[str]

score: float = Field(ge=-1.0, le=1.0)

review = "The product works well but setup was confusing..."

analysis = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": f"Analyze this review: {review}"

}],

response_model=SentimentAnalysis

)

```

Pattern 5: Batch Processing

```python

def extract_person(text: str) -> Person:

return client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": f"Extract person from: {text}"

}],

response_model=Person

)

texts = [

"John Doe is a 30-year-old engineer",

"Jane Smith, 25, works in marketing",

"Bob Johnson, age 40, software developer"

]

people = [extract_person(text) for text in texts]

```

Advanced Features

Union Types

```python

from typing import Union

class TextContent(BaseModel):

type: str = "text"

content: str

class ImageContent(BaseModel):

type: str = "image"

url: HttpUrl

caption: str

class Post(BaseModel):

title: str

content: Union[TextContent, ImageContent] # Either type

# LLM chooses appropriate type based on content

```

Dynamic Models

```python

from pydantic import create_model

# Create model at runtime

DynamicUser = create_model(

'User',

name=(str, ...),

age=(int, Field(ge=0)),

email=(EmailStr, ...)

)

user = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[...],

response_model=DynamicUser

)

```

Custom Modes

```python

# For providers without native structured outputs

client = instructor.from_anthropic(

Anthropic(),

mode=instructor.Mode.JSON # JSON mode

)

# Available modes:

# - Mode.ANTHROPIC_TOOLS (recommended for Claude)

# - Mode.JSON (fallback)

# - Mode.TOOLS (OpenAI tools)

```

Context Management

```python

# Single-use client

with instructor.from_anthropic(Anthropic()) as client:

result = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[...],

response_model=YourModel

)

# Client closed automatically

```

Error Handling

Handling Validation Errors

```python

from pydantic import ValidationError

try:

user = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[...],

response_model=User,

max_retries=3

)

except ValidationError as e:

print(f"Failed after retries: {e}")

# Handle gracefully

except Exception as e:

print(f"API error: {e}")

```

Custom Error Messages

```python

class ValidatedUser(BaseModel):

age: int = Field(description="Age between 0 and 120", ge=0, le=120)

email: EmailStr = Field(description="Valid email address")

class Config:

# Custom error messages

json_schema_extra = {

"examples": [

{

"name": "John Doe",

"age": 30,

"email": "john@example.com"

}

]

}

```

Best Practices

1. Clear Field Descriptions

```python

# ❌ Bad: Vague

class Product(BaseModel):

price: float

# ✅ Good: Descriptive

class Product(BaseModel):

price: float = Field(description="Price in USD, without currency symbol")

```

2. Use Appropriate Validation

```python

# ✅ Good: Constrain values

class Rating(BaseModel):

score: int = Field(ge=1, le=5, description="Rating from 1 to 5 stars")

review: str = Field(min_length=10, description="Review text, at least 10 chars")

```

3. Provide Examples in Prompts

```python

messages = [{

"role": "user",

"content": """Extract person info from: "John, 30, engineer"

Example format:

{

"name": "John Doe",

"age": 30,

"occupation": "engineer"

}"""

}]

```

4. Use Enums for Fixed Categories

```python

# ✅ Good: Enum ensures valid values

class Status(str, Enum):

PENDING = "pending"

APPROVED = "approved"

REJECTED = "rejected"

class Application(BaseModel):

status: Status # LLM must choose from enum

```

5. Handle Missing Data Gracefully

```python

class PartialData(BaseModel):

required_field: str

optional_field: Optional[str] = None

default_field: str = "default_value"

# LLM only needs to provide required_field

```

Comparison to Alternatives

|---------|------------|-------------|-----------|------|

| Type Safety | ✅ Yes | ❌ No | ⚠️ Partial | ✅ Yes |

| Auto Validation | ✅ Yes | ❌ No | ❌ No | ⚠️ Limited |

| Auto Retry | ✅ Yes | ❌ No | ❌ No | ✅ Yes |

| Streaming | ✅ Yes | ❌ No | ✅ Yes | ❌ No |

| Multi-Provider | ✅ Yes | ⚠️ Manual | ✅ Yes | ✅ Yes |

When to choose Instructor:

Need structured, validated outputs
Want type safety and IDE support
Require automatic retries
Building data extraction systems

When to choose alternatives:

DSPy: Need prompt optimization
LangChain: Building complex chains
Manual: Simple, one-off extractions

Resources

Documentation: https://python.useinstructor.com
GitHub: https://github.com/jxnl/instructor (15k+ stars)
Cookbook: https://python.useinstructor.com/examples
Discord: Community support available

More from this repository10

🎯

nextjs-shadcn-builder🎯Skill

nextjs-shadcn-builder skill from ovachiever/droid-tings

🎯

security-auditor🎯Skill

security-auditor skill from ovachiever/droid-tings

🎯

threejs-graphics-optimizer🎯Skill

threejs-graphics-optimizer skill from ovachiever/droid-tings

🎯

api-documenter🎯Skill

api-documenter skill from ovachiever/droid-tings

🎯

secret-scanner🎯Skill

secret-scanner skill from ovachiever/droid-tings

🎯

readme-updater🎯Skill

readme-updater skill from ovachiever/droid-tings

🎯

applying-brand-guidelines🎯Skill

applying-brand-guidelines skill from ovachiever/droid-tings

🎯

tailwind-v4-shadcn🎯Skill

Configures Tailwind v4 with shadcn/ui, automating CSS variable setup, dark mode, and preventing common initialization errors.

🎯

deep-reading-analyst🎯Skill

deep-reading-analyst skill from ovachiever/droid-tings

🎯

dependency-auditor🎯Skill

dependency-auditor skill from ovachiever/droid-tings

instructor

Installation

Skill Details

Overview

When to Use This Skill

Installation

Quick Start

Basic Example: Extract User Data

With OpenAI

Core Concepts

1. Response Models (Pydantic)

2. Validation

3. Automatic Retrying

4. Streaming

Provider Configuration

Anthropic Claude

OpenAI

Local Models (Ollama)

Common Patterns

Pattern 1: Data Extraction from Text

Pattern 2: Classification

Pattern 3: Multi-Entity Extraction

Pattern 4: Structured Analysis

Pattern 5: Batch Processing

Advanced Features

Union Types

Dynamic Models

Custom Modes

Context Management

Error Handling

Handling Validation Errors

Custom Error Messages

Best Practices

1. Clear Field Descriptions

2. Use Appropriate Validation

3. Provide Examples in Prompts

4. Use Enums for Fixed Categories

5. Handle Missing Data Gracefully

Comparison to Alternatives

Resources

See Also

More from this repository10