Skip to content

Full-parity TypeScript port of DSPy — automated prompt optimization for LLMs

License

Notifications You must be signed in to change notification settings

productioneer/dspy-ts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dspy-ts

v0.1 — Experimental Release

This is pre-production software. It has been tested (669 tests passing) but has not been battle-tested in production environments. Do not use blindly in production. Verify any behavior you depend on with your own eyes. We tried our best, but we make no guarantees that everything works correctly. APIs may change without notice.

Built by AI. This codebase was written and maintained primarily by AI agents (Claude and Codex), with human direction and review. We believe it to be useful, safe, and secure — but verify for yourself. Issues and pull requests are welcome.

Full-parity TypeScript port of DSPy v3.1.3 — the framework for programming, not prompting, language models.

What's Included

Core

  • Signature — typed input/output field declarations with automatic parsing
  • Example / Prediction — structured data containers for demos and outputs
  • Module / BaseModule — composable program units with namedPredictors(), deepCopy(), dumpState() / loadState()
  • LM — abstract language model class with caching, history, usage tracking, and streaming hooks
  • Evaluate — batch evaluation framework with metrics and parallel execution

Predict Modules

  • Predict — core predict with demos, temperature auto-adjustment (n>1 + temp≤0.15 → 0.7)
  • ChainOfThought — adds a reasoning field before the output
  • BestOfN — generate N completions, pick best by reward function
  • MultiChainComparison — generate multiple reasoning chains, compare them
  • Refine — iterative self-refinement with feedback
  • ReAct — reasoning + action loops with tool use
  • ProgramOfThought — generate and execute code to produce answers
  • CodeAct — agentic code execution with REPL state
  • RLM — retrieval-augmented language model
  • Parallel — run multiple predict calls concurrently
  • Retrieve — pluggable retrieval module with global retriever config
  • Tool / ToolCall / ToolCalls — structured tool definitions and invocations

Adapters

  • ChatAdapter — formats signatures as chat messages with [[ ## field ## ]] markers
  • JSONAdapter — JSON-formatted output parsing
  • XMLAdapter — XML-formatted output parsing
  • TwoStepAdapter — two-pass extraction (natural language → structured)

Adapter Types

Image, Audio, DSPyFile, History, Code, Reasoning, Document, Citation, Citations — rich types for multimodal and structured content with native response type extraction

Optimizers

Optimizer Description
LabeledFewShot Select demos from labeled examples
BootstrapFewShot Generate demos via bootstrapped execution
BootstrapFewShotWithRandomSearch Bootstrap + random search over demo sets
BootstrapFewShotWithOptuna Bootstrap backed by Optuna-style trial search
COPRO Collaborative Prompt Optimization — LM-proposed instructions
MIPROv2 Multi-prompt Instruction Proposal Optimizer with TPE + minibatch eval
SIMBA Softmax selection, Poisson demo dropping, rule generation
GEPA Generalized Efficient Prompt Approximation — state-of-the-art
KNNFewShot k-nearest-neighbor demo selection at inference time
InferRules Extract reusable rules from execution traces
Ensemble Combine multiple program variants
AvatarOptimizer Persona-based optimization
BootstrapFinetune Bootstrap training data then finetune the LM
GRPO Group Relative Policy Optimization (RL-based weight training)
BetterTogether Joint optimization of prompts and finetunes

Infrastructure

  • TPE — Tree-structured Parzen Estimator for Bayesian optimization (used by MIPROv2)
  • KNN — k-nearest-neighbor retriever with pluggable embedding
  • Cache — disk-backed caching with TTL and eviction
  • Embedder — embedding client with caching support
  • Callbacks — full observability system (module start/end, LM start/end, adapter format/parse)
  • Streamingstreamify() wrapper, StreamListener, StatusMessage types
  • Dataset — data loading utilities
  • UsageTracker — token usage tracking across LM calls
  • ParallelExecutor — concurrent execution with configurable limits
  • Provider / TrainingJob / ReinforceJob — finetuning infrastructure

Quick Start

bun install
import { LM, configure, ChainOfThought } from "dspy-ts";

// Implement the LM abstract class for your provider
class MyLM extends LM {
  async forward(
    messages: Array<{ role: string; content: string }>,
    config: LMConfig
  ): Promise<LMResponse[]> {
    // Call your LLM provider here
    const response = await callMyProvider(messages, config);
    return [{ text: response.text, usage: response.usage }];
  }
}

const lm = new MyLM({ model: "my-model", temperature: 0.7 });
configure({ lm });

const cot = new ChainOfThought("question -> answer");
const result = await cot.call({ question: "What is DSPy?" });
console.log(result.answer);

Implementing a Custom LM

Extend the LM abstract class and implement forward():

import { LM, type LMConfig, type LMResponse } from "dspy-ts";

class CustomLM extends LM {
  async forward(
    messages: Array<{ role: string; content: string }>,
    config: LMConfig
  ): Promise<LMResponse[]> {
    // config.temperature, config.maxTokens, config.n are available
    // Return an array of LMResponse objects (length = config.n)
    return [{ text: "response text", usage: { promptTokens: 10, completionTokens: 20 } }];
  }
}

The base class handles caching, history tracking, usage aggregation, retry logic, and streaming hooks automatically.

Callbacks

Observe every level of execution:

import { BaseCallback, setGlobalCallbacks, ChainOfThought } from "dspy-ts";

class LoggingCallback extends BaseCallback {
  onModuleStart(callId: string, instance: unknown, inputs: Record<string, unknown>) {
    console.log(`[${(instance as any).constructor.name}] start:`, inputs);
  }
  onModuleEnd(callId: string, outputs: unknown, exception?: Error) {
    console.log("  done:", exception ? `error: ${exception.message}` : outputs);
  }
  onLmStart(callId: string, instance: unknown, inputs: Record<string, unknown>) {
    console.log("  LM call");
  }
  onLmEnd(callId: string, outputs: unknown, exception?: Error) {
    console.log("  LM done");
  }
}

setGlobalCallbacks([new LoggingCallback()]);

Running Tests

bun test           # run all 669 tests
bun test tests/    # run specific directory

Architecture

DSPy programs are built from composable modules, each containing one or more Predict instances. Signatures define typed input/output contracts. Adapters format signatures into LM prompts and parse responses. Optimizers search the space of instructions and demos to maximize a metric.

Module (ChainOfThought, ReAct, etc.)
  └── Predict (core prediction unit)
        ├── Signature (typed I/O contract)
        ├── Adapter (prompt formatting / response parsing)
        └── LM (language model backend)

Optimizers wrap this pipeline: they generate candidate prompts (instructions + demos), evaluate them against a metric, and select the best configuration.

License

MIT

About

Full-parity TypeScript port of DSPy — automated prompt optimization for LLMs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •