v0.1 — Experimental Release
This is pre-production software. It has been tested (669 tests passing) but has not been battle-tested in production environments. Do not use blindly in production. Verify any behavior you depend on with your own eyes. We tried our best, but we make no guarantees that everything works correctly. APIs may change without notice.
Built by AI. This codebase was written and maintained primarily by AI agents (Claude and Codex), with human direction and review. We believe it to be useful, safe, and secure — but verify for yourself. Issues and pull requests are welcome.
Full-parity TypeScript port of DSPy v3.1.3 — the framework for programming, not prompting, language models.
- Signature — typed input/output field declarations with automatic parsing
- Example / Prediction — structured data containers for demos and outputs
- Module / BaseModule — composable program units with
namedPredictors(),deepCopy(),dumpState()/loadState() - LM — abstract language model class with caching, history, usage tracking, and streaming hooks
- Evaluate — batch evaluation framework with metrics and parallel execution
- Predict — core predict with demos, temperature auto-adjustment (n>1 + temp≤0.15 → 0.7)
- ChainOfThought — adds a
reasoningfield before the output - BestOfN — generate N completions, pick best by reward function
- MultiChainComparison — generate multiple reasoning chains, compare them
- Refine — iterative self-refinement with feedback
- ReAct — reasoning + action loops with tool use
- ProgramOfThought — generate and execute code to produce answers
- CodeAct — agentic code execution with REPL state
- RLM — retrieval-augmented language model
- Parallel — run multiple predict calls concurrently
- Retrieve — pluggable retrieval module with global retriever config
- Tool / ToolCall / ToolCalls — structured tool definitions and invocations
- ChatAdapter — formats signatures as chat messages with
[[ ## field ## ]]markers - JSONAdapter — JSON-formatted output parsing
- XMLAdapter — XML-formatted output parsing
- TwoStepAdapter — two-pass extraction (natural language → structured)
Image, Audio, DSPyFile, History, Code, Reasoning, Document, Citation, Citations — rich types for multimodal and structured content with native response type extraction
| Optimizer | Description |
|---|---|
| LabeledFewShot | Select demos from labeled examples |
| BootstrapFewShot | Generate demos via bootstrapped execution |
| BootstrapFewShotWithRandomSearch | Bootstrap + random search over demo sets |
| BootstrapFewShotWithOptuna | Bootstrap backed by Optuna-style trial search |
| COPRO | Collaborative Prompt Optimization — LM-proposed instructions |
| MIPROv2 | Multi-prompt Instruction Proposal Optimizer with TPE + minibatch eval |
| SIMBA | Softmax selection, Poisson demo dropping, rule generation |
| GEPA | Generalized Efficient Prompt Approximation — state-of-the-art |
| KNNFewShot | k-nearest-neighbor demo selection at inference time |
| InferRules | Extract reusable rules from execution traces |
| Ensemble | Combine multiple program variants |
| AvatarOptimizer | Persona-based optimization |
| BootstrapFinetune | Bootstrap training data then finetune the LM |
| GRPO | Group Relative Policy Optimization (RL-based weight training) |
| BetterTogether | Joint optimization of prompts and finetunes |
- TPE — Tree-structured Parzen Estimator for Bayesian optimization (used by MIPROv2)
- KNN — k-nearest-neighbor retriever with pluggable embedding
- Cache — disk-backed caching with TTL and eviction
- Embedder — embedding client with caching support
- Callbacks — full observability system (module start/end, LM start/end, adapter format/parse)
- Streaming —
streamify()wrapper,StreamListener,StatusMessagetypes - Dataset — data loading utilities
- UsageTracker — token usage tracking across LM calls
- ParallelExecutor — concurrent execution with configurable limits
- Provider / TrainingJob / ReinforceJob — finetuning infrastructure
bun installimport { LM, configure, ChainOfThought } from "dspy-ts";
// Implement the LM abstract class for your provider
class MyLM extends LM {
async forward(
messages: Array<{ role: string; content: string }>,
config: LMConfig
): Promise<LMResponse[]> {
// Call your LLM provider here
const response = await callMyProvider(messages, config);
return [{ text: response.text, usage: response.usage }];
}
}
const lm = new MyLM({ model: "my-model", temperature: 0.7 });
configure({ lm });
const cot = new ChainOfThought("question -> answer");
const result = await cot.call({ question: "What is DSPy?" });
console.log(result.answer);Extend the LM abstract class and implement forward():
import { LM, type LMConfig, type LMResponse } from "dspy-ts";
class CustomLM extends LM {
async forward(
messages: Array<{ role: string; content: string }>,
config: LMConfig
): Promise<LMResponse[]> {
// config.temperature, config.maxTokens, config.n are available
// Return an array of LMResponse objects (length = config.n)
return [{ text: "response text", usage: { promptTokens: 10, completionTokens: 20 } }];
}
}The base class handles caching, history tracking, usage aggregation, retry logic, and streaming hooks automatically.
Observe every level of execution:
import { BaseCallback, setGlobalCallbacks, ChainOfThought } from "dspy-ts";
class LoggingCallback extends BaseCallback {
onModuleStart(callId: string, instance: unknown, inputs: Record<string, unknown>) {
console.log(`[${(instance as any).constructor.name}] start:`, inputs);
}
onModuleEnd(callId: string, outputs: unknown, exception?: Error) {
console.log(" done:", exception ? `error: ${exception.message}` : outputs);
}
onLmStart(callId: string, instance: unknown, inputs: Record<string, unknown>) {
console.log(" LM call");
}
onLmEnd(callId: string, outputs: unknown, exception?: Error) {
console.log(" LM done");
}
}
setGlobalCallbacks([new LoggingCallback()]);bun test # run all 669 tests
bun test tests/ # run specific directoryDSPy programs are built from composable modules, each containing one or more Predict instances. Signatures define typed input/output contracts. Adapters format signatures into LM prompts and parse responses. Optimizers search the space of instructions and demos to maximize a metric.
Module (ChainOfThought, ReAct, etc.)
└── Predict (core prediction unit)
├── Signature (typed I/O contract)
├── Adapter (prompt formatting / response parsing)
└── LM (language model backend)
Optimizers wrap this pipeline: they generate candidate prompts (instructions + demos), evaluate them against a metric, and select the best configuration.
MIT