Overview
TextGrad is a Python framework that brings automatic "differentiation" to text. Instead of numeric gradients, it backpropagates feedback written by an LLM, so you can improve prompts, answers, code, and other text variables through a loop that mirrors how neural networks are trained.
The API deliberately follows PyTorch. You wrap text in a Variable, define a loss with TextLoss, and call loss.backward() and optimizer.step() using TGD (Textual Gradient Descent). If you know PyTorch, most of the concepts carry over directly.
It fits the prompt-programming space as an optimization layer: rather than hand-tuning prompts, you let the framework critique and revise text against an evaluation instruction. It works with many model providers through a litellm-based engine, including OpenAI, Anthropic, Gemini, Bedrock, and Together.
What it does
- PyTorch-style API: Variable, TextLoss, backward(), and a TGD (Textual Gradient Descent) optimizer
- Backpropagation through natural-language feedback produced by an LLM
- Optimizes many kinds of text variables — prompts, answers, code, and solutions
- BlackboxLLM wrapper for running a model forward pass on a Variable
- Experimental litellm engine supporting OpenAI, Anthropic, Gemini, Bedrock, Together, and more
- Optional response caching and multimodal (image) input via the litellm engine
Getting started
Install the package, set your model API key, then run a short optimize-the-answer loop that mirrors PyTorch.
Install TextGrad
Install from PyPI. You'll also need an API key (for example OpenAI or Anthropic) set in your environment.
pip install textgradSet the backward engine and run a forward pass
Pick a model as the backward engine, then wrap your question in a Variable and get an initial answer from a BlackboxLLM.
import textgrad as tg
tg.set_backward_engine("gpt-4o", override=True)
model = tg.BlackboxLLM("gpt-4o")
question_string = ("If it takes 1 hour to dry 25 shirts under the sun, "
"how long will it take to dry 30 shirts under the sun? "
"Reason step by step")
question = tg.Variable(question_string,
role_description="question to the LLM",
requires_grad=False)
answer = model(question)Define a loss and optimize the answer
Write an evaluation instruction as a TextLoss, then run the backward pass and optimizer step — the same syntax as PyTorch — to revise the answer.
answer.set_role_description("concise and accurate answer to the question")
optimizer = tg.TGD(parameters=[answer])
evaluation_instruction = (f"Here's a question: {question_string}. "
"Evaluate any given answer to this question, "
"be smart, logical, and very critical. "
"Just provide concise feedback.")
loss_fn = tg.TextLoss(evaluation_instruction)
loss = loss_fn(answer)
loss.backward()
optimizer.step()Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Automatically refine an LLM's reasoning or answer when a first response is wrong
- Tune system prompts and instructions instead of editing them by hand
- Improve generated code or problem solutions through iterative LLM critique
- Optimize text variables in research workflows across multiple model providers via litellm
How TextGrad compares
TextGrad alongside other open-source prompt programming tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| DSPy | ★ 35.8k | A Stanford framework for programming language models with composable modules and automatic prompt optimization instead of hand-written prompts. |
| ell | ★ 5.9k | A Python library that treats prompts as versioned functions, with tooling to track, visualize, and iterate on them as code. |
| GEPA | ★ 5.5k | A reflective, evolutionary optimizer that improves prompts and other text components of a system using language-model feedback. |
| LMQL | ★ 4.2k | A query language for LLMs that mixes Python control flow with prompts and constraints to script multi-step generation. |
| AdalFlow | ★ 4.2k | A PyTorch-like library for building and auto-optimizing LLM pipelines, tuning prompts across the components of a task. |
| TextGrad | ★ 3.6k | Optimize prompts and text variables with backpropagation through LLM feedback |
| Mirascope | ★ 1.5k | A lightweight Python toolkit for writing LLM calls as typed functions with prompt templates, chaining, and a single interface across providers. |