Dspy — DSPy: declarative LM programs, auto-optimize prompts, RAG
DSPy: declarative LM programs, auto-optimize prompts, RAG.
Skill metadata
Section titled “Skill metadata”| Source | Bundled (installed by default) |
| Path | skills/mlops/research/dspy |
| Version | 1.0.0 |
| Author | Orchestra Research |
| License | MIT |
| Dependencies | dspy, openai, anthropic |
| Tags | Prompt Engineering, DSPy, Declarative Programming, RAG, Agents, Prompt Optimization, LM Programming, Stanford NLP, Automatic Optimization, Modular AI |
Reference: full SKILL.md
Section titled “Reference: full SKILL.md”Info The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
DSPy: Declarative Language Model Programming
Section titled “DSPy: Declarative Language Model Programming”When to Use This Skill
Section titled “When to Use This Skill”Use DSPy when you need to:
- Build complex AI systems with multiple components and workflows
- Program LMs declaratively instead of manual prompt engineering
- Optimize prompts automatically using data-driven methods
- Create modular AI pipelines that are maintainable and portable
- Improve model outputs systematically with optimizers
- Build RAG systems, agents, or classifiers with better reliability
GitHub Stars: 22,000+ | Created By: Stanford NLP
Installation
Section titled “Installation”# Stable releasepip install dspy
# Latest development versionpip install git+https://github.com/stanfordnlp/dspy.git
# With specific LM providerspip install dspy[openai] # OpenAIpip install dspy[anthropic] # Anthropic Claudepip install dspy[all] # All providersQuick Start
Section titled “Quick Start”Basic Example: Question Answering
Section titled “Basic Example: Question Answering”import dspy
# Configure your language modellm = dspy.Claude(model="claude-sonnet-4-5-20250929")dspy.settings.configure(lm=lm)
# Define a signature (input → output)class QA(dspy.Signature): """Answer questions with short factual answers.""" question = dspy.InputField() answer = dspy.OutputField(desc="often between 1 and 5 words")
# Create a moduleqa = dspy.Predict(QA)
# Use itresponse = qa(question="What is the capital of France?")print(response.answer) # "Paris"Chain of Thought Reasoning
Section titled “Chain of Thought Reasoning”import dspy
lm = dspy.Claude(model="claude-sonnet-4-5-20250929")dspy.settings.configure(lm=lm)
# Use ChainOfThought for better reasoningclass MathProblem(dspy.Signature): """Solve math word problems.""" problem = dspy.InputField() answer = dspy.OutputField(desc="numerical answer")
# ChainOfThought generates reasoning steps automaticallycot = dspy.ChainOfThought(MathProblem)
response = cot(problem="If John has 5 apples and gives 2 to Mary, how many does he have?")print(response.rationale) # Shows reasoning stepsprint(response.answer) # "3"Core Concepts
Section titled “Core Concepts”1. Signatures
Section titled “1. Signatures”Signatures define the structure of your AI task (inputs → outputs):
# Inline signature (simple)qa = dspy.Predict("question -> answer")
# Class signature (detailed)class Summarize(dspy.Signature): """Summarize text into key points.""" text = dspy.InputField() summary = dspy.OutputField(desc="bullet points, 3-5 items")
summarizer = dspy.ChainOfThought(Summarize)When to use each:
- Inline: Quick prototyping, simple tasks
- Class: Complex tasks, type hints, better documentation
2. Modules
Section titled “2. Modules”Modules are reusable components that transform inputs to outputs:
dspy.Predict
Section titled “dspy.Predict”Basic prediction module:
predictor = dspy.Predict("context, question -> answer")result = predictor(context="Paris is the capital of France", question="What is the capital?")dspy.ChainOfThought
Section titled “dspy.ChainOfThought”Generates reasoning steps before answering:
cot = dspy.ChainOfThought("question -> answer")result = cot(question="Why is the sky blue?")print(result.rationale) # Reasoning stepsprint(result.answer) # Final answerdspy.ReAct
Section titled “dspy.ReAct”Agent-like reasoning with tools:
from dspy.predict import ReAct
class SearchQA(dspy.Signature): """Answer questions using search.""" question = dspy.InputField() answer = dspy.OutputField()
def search_tool(query: str) -> str: """Search Wikipedia.""" # Your search implementation return results
react = ReAct(SearchQA, tools=[search_tool])result = react(question="When was Python created?")dspy.ProgramOfThought
Section titled “dspy.ProgramOfThought”Generates and executes code for reasoning:
pot = dspy.ProgramOfThought("question -> answer")result = pot(question="What is 15% of 240?")# Generates: answer = 240 * 0.153. Optimizers
Section titled “3. Optimizers”Optimizers improve your modules automatically using training data:
BootstrapFewShot
Section titled “BootstrapFewShot”Learns from examples:
from dspy.teleprompt import BootstrapFewShot
# Training datatrainset = [ dspy.Example(question="What is 2+2?", answer="4").with_inputs("question"), dspy.Example(question="What is 3+5?", answer="8").with_inputs("question"),]
# Define metricdef validate_answer(example, pred, trace=None): return example.answer == pred.answer
# Optimizeoptimizer = BootstrapFewShot(metric=validate_answer, max_bootstrapped_demos=3)optimized_qa = optimizer.compile(qa, trainset=trainset)
# Now optimized_qa performs better!MIPRO (Most Important Prompt Optimization)
Section titled “MIPRO (Most Important Prompt Optimization)”Iteratively improves prompts:
from dspy.teleprompt import MIPRO
optimizer = MIPRO( metric=validate_answer, num_candidates=10, init_temperature=1.0)
optimized_cot = optimizer.compile( cot, trainset=trainset, num_trials=100)BootstrapFinetune
Section titled “BootstrapFinetune”Creates datasets for model fine-tuning:
from dspy.teleprompt import BootstrapFinetune
optimizer = BootstrapFinetune(metric=validate_answer)optimized_module = optimizer.compile(qa, trainset=trainset)
# Exports training data for fine-tuning4. Building Complex Systems
Section titled “4. Building Complex Systems”Multi-Stage Pipeline
Section titled “Multi-Stage Pipeline”import dspy
class MultiHopQA(dspy.Module): def __init__(self): super().__init__() self.retrieve = dspy.Retrieve(k=3) self.generate_query = dspy.ChainOfThought("question -> search_query") self.generate_answer = dspy.ChainOfThought("context, question -> answer")
def forward(self, question): # Stage 1: Generate search query search_query = self.generate_query(question=question).search_query
# Stage 2: Retrieve context passages = self.retrieve(search_query).passages context = "\n".join(passages)
# Stage 3: Generate answer answer = self.generate_answer(context=context, question=question).answer return dspy.Prediction(answer=answer, context=context)
# Use the pipelineqa_system = MultiHopQA()result = qa_system(question="Who wrote the book that inspired the movie Blade Runner?")RAG System with Optimization
Section titled “RAG System with Optimization”import dspyfrom dspy.retrieve.chromadb_rm import ChromadbRM
# Configure retrieverretriever = ChromadbRM( collection_name="documents", persist_directory="./chroma_db")
class RAG(dspy.Module): def __init__(self, num_passages=3): super().__init__() self.retrieve = dspy.Retrieve(k=num_passages) self.generate = dspy.ChainOfThought("context, question -> answer")
def forward(self, question): context = self.retrieve(question).passages return self.generate(context=context, question=question)
# Create and optimizerag = RAG()
# Optimize with training datafrom dspy.teleprompt import BootstrapFewShot
optimizer = BootstrapFewShot(metric=validate_answer)optimized_rag = optimizer.compile(rag, trainset=trainset)LM Provider Configuration
Section titled “LM Provider Configuration”Anthropic Claude
Section titled “Anthropic Claude”import dspy
lm = dspy.Claude( model="claude-sonnet-4-5-20250929", api_key="your-api-key", # Or set ANTHROPIC_API_KEY env var max_tokens=1000, temperature=0.7)dspy.settings.configure(lm=lm)OpenAI
Section titled “OpenAI”lm = dspy.OpenAI( model="gpt-4", api_key="your-api-key", max_tokens=1000)dspy.settings.configure(lm=lm)Local Models (Ollama)
Section titled “Local Models (Ollama)”lm = dspy.OllamaLocal( model="llama3.1", base_url="http://localhost:11434")dspy.settings.configure(lm=lm)Multiple Models
Section titled “Multiple Models”# Different models for different taskscheap_lm = dspy.OpenAI(model="gpt-3.5-turbo")strong_lm = dspy.Claude(model="claude-sonnet-4-5-20250929")
# Use cheap model for retrieval, strong model for reasoningwith dspy.settings.context(lm=cheap_lm): context = retriever(question)
with dspy.settings.context(lm=strong_lm): answer = generator(context=context, question=question)Common Patterns
Section titled “Common Patterns”Pattern 1: Structured Output
Section titled “Pattern 1: Structured Output”from pydantic import BaseModel, Field
class PersonInfo(BaseModel): name: str = Field(description="Full name") age: int = Field(description="Age in years") occupation: str = Field(description="Current job")
class ExtractPerson(dspy.Signature): """Extract person information from text.""" text = dspy.InputField() person: PersonInfo = dspy.OutputField()
extractor = dspy.TypedPredictor(ExtractPerson)result = extractor(text="John Doe is a 35-year-old software engineer.")print(result.person.name) # "John Doe"print(result.person.age) # 35Pattern 2: Assertion-Driven Optimization
Section titled “Pattern 2: Assertion-Driven Optimization”import dspyfrom dspy.primitives.assertions import assert_transform_module, backtrack_handler
class MathQA(dspy.Module): def __init__(self): super().__init__() self.solve = dspy.ChainOfThought("problem -> solution: float")
def forward(self, problem): solution = self.solve(problem=problem).solution
# Assert solution is numeric dspy.Assert( isinstance(float(solution), float), "Solution must be a number", backtrack=backtrack_handler )
return dspy.Prediction(solution=solution)Pattern 3: Self-Consistency
Section titled “Pattern 3: Self-Consistency”import dspyfrom collections import Counter
class ConsistentQA(dspy.Module): def __init__(self, num_samples=5): super().__init__() self.qa = dspy.ChainOfThought("question -> answer") self.num_samples = num_samples
def forward(self, question): # Generate multiple answers answers = [] for _ in range(self.num_samples): result = self.qa(question=question) answers.append(result.answer)
# Return most common answer most_common = Counter(answers).most_common(1)[0][0] return dspy.Prediction(answer=most_common)Pattern 4: Retrieval with Reranking
Section titled “Pattern 4: Retrieval with Reranking”class RerankedRAG(dspy.Module): def __init__(self): super().__init__() self.retrieve = dspy.Retrieve(k=10) self.rerank = dspy.Predict("question, passage -> relevance_score: float") self.answer = dspy.ChainOfThought("context, question -> answer")
def forward(self, question): # Retrieve candidates passages = self.retrieve(question).passages
# Rerank passages scored = [] for passage in passages: score = float(self.rerank(question=question, passage=passage).relevance_score) scored.append((score, passage))
# Take top 3 top_passages = [p for _, p in sorted(scored, reverse=True)[:3]] context = "\n\n".join(top_passages)
# Generate answer return self.answer(context=context, question=question)Evaluation and Metrics
Section titled “Evaluation and Metrics”Custom Metrics
Section titled “Custom Metrics”def exact_match(example, pred, trace=None): """Exact match metric.""" return example.answer.lower() == pred.answer.lower()
def f1_score(example, pred, trace=None): """F1 score for text overlap.""" pred_tokens = set(pred.answer.lower().split()) gold_tokens = set(example.answer.lower().split())
if not pred_tokens: return 0.0
precision = len(pred_tokens & gold_tokens) / len(pred_tokens) recall = len(pred_tokens & gold_tokens) / len(gold_tokens)
if precision + recall == 0: return 0.0
return 2 * (precision * recall) / (precision + recall)Evaluation
Section titled “Evaluation”from dspy.evaluate import Evaluate
# Create evaluatorevaluator = Evaluate( devset=testset, metric=exact_match, num_threads=4, display_progress=True)
# Evaluate modelscore = evaluator(qa_system)print(f"Accuracy: {score}")
# Compare optimized vs unoptimizedscore_before = evaluator(qa)score_after = evaluator(optimized_qa)print(f"Improvement: {score_after - score_before:.2%}")Best Practices
Section titled “Best Practices”1. Start Simple, Iterate
Section titled “1. Start Simple, Iterate”# Start with Predictqa = dspy.Predict("question -> answer")
# Add reasoning if neededqa = dspy.ChainOfThought("question -> answer")
# Add optimization when you have dataoptimized_qa = optimizer.compile(qa, trainset=data)2. Use Descriptive Signatures
Section titled “2. Use Descriptive Signatures”# ❌ Bad: Vagueclass Task(dspy.Signature): input = dspy.InputField() output = dspy.OutputField()
# ✅ Good: Descriptiveclass SummarizeArticle(dspy.Signature): """Summarize news articles into 3-5 key points.""" article = dspy.InputField(desc="full article text") summary = dspy.OutputField(desc="bullet points, 3-5 items")3. Optimize with Representative Data
Section titled “3. Optimize with Representative Data”# Create diverse training examplestrainset = [ dspy.Example(question="factual", answer="...).with_inputs("question"), dspy.Example(question="reasoning", answer="...").with_inputs("question"), dspy.Example(question="calculation", answer="...").with_inputs("question"),]
# Use validation set for metricdef metric(example, pred, trace=None): return example.answer in pred.answer4. Save and Load Optimized Models
Section titled “4. Save and Load Optimized Models”# Saveoptimized_qa.save("models/qa_v1.json")
# Loadloaded_qa = dspy.ChainOfThought("question -> answer")loaded_qa.load("models/qa_v1.json")5. Monitor and Debug
Section titled “5. Monitor and Debug”# Enable tracingdspy.settings.configure(lm=lm, trace=[])
# Run predictionresult = qa(question="...")
# Inspect tracefor call in dspy.settings.trace: print(f"Prompt: {call['prompt']}") print(f"Response: {call['response']}")Comparison to Other Approaches
Section titled “Comparison to Other Approaches”| Feature | Manual Prompting | LangChain | DSPy |
|---|---|---|---|
| Prompt Engineering | Manual | Manual | Automatic |
| Optimization | Trial & error | None | Data-driven |
| Modularity | Low | Medium | High |
| Type Safety | No | Limited | Yes (Signatures) |
| Portability | Low | Medium | High |
| Learning Curve | Low | Medium | Medium-High |
When to choose DSPy:
- You have training data or can generate it
- You need systematic prompt improvement
- You’re building complex multi-stage systems
- You want to optimize across different LMs
When to choose alternatives:
- Quick prototypes (manual prompting)
- Simple chains with existing tools (LangChain)
- Custom optimization logic needed
Resources
Section titled “Resources”- Documentation: https://dspy.ai
- GitHub: https://github.com/stanfordnlp/dspy (22k+ stars)
- Discord: https://discord.gg/XCGy2WDCQB
- Twitter: @DSPyOSS
- Paper: “DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines”
See Also
Section titled “See Also”references/modules.md- Detailed module guide (Predict, ChainOfThought, ReAct, ProgramOfThought)references/optimizers.md- Optimization algorithms (BootstrapFewShot, MIPRO, BootstrapFinetune)references/examples.md- Real-world examples (RAG, agents, classifiers)