TextGrad

Optimize prompts and text variables with backpropagation through LLM feedback

github.com/zou-group/textgrad★ 3.6k textgrad.com

Overview

TextGrad is a Python framework that brings automatic "differentiation" to text. Instead of numeric gradients, it backpropagates feedback written by an LLM, so you can improve prompts, answers, code, and other text variables through a loop that mirrors how neural networks are trained.

The API deliberately follows PyTorch. You wrap text in a Variable, define a loss with TextLoss, and call loss.backward() and optimizer.step() using TGD (Textual Gradient Descent). If you know PyTorch, most of the concepts carry over directly.

It fits the prompt-programming space as an optimization layer: rather than hand-tuning prompts, you let the framework critique and revise text against an evaluation instruction. It works with many model providers through a litellm-based engine, including OpenAI, Anthropic, Gemini, Bedrock, and Together.

What it does

PyTorch-style API: Variable, TextLoss, backward(), and a TGD (Textual Gradient Descent) optimizer
Backpropagation through natural-language feedback produced by an LLM
Optimizes many kinds of text variables — prompts, answers, code, and solutions
BlackboxLLM wrapper for running a model forward pass on a Variable
Experimental litellm engine supporting OpenAI, Anthropic, Gemini, Bedrock, Together, and more
Optional response caching and multimodal (image) input via the litellm engine

Getting started

Install the package, set your model API key, then run a short optimize-the-answer loop that mirrors PyTorch.

Install TextGrad

Install from PyPI. You'll also need an API key (for example OpenAI or Anthropic) set in your environment.

bashbash

pip install textgrad

Set the backward engine and run a forward pass

Pick a model as the backward engine, then wrap your question in a Variable and get an initial answer from a BlackboxLLM.

pythonpython

import textgrad as tg

tg.set_backward_engine("gpt-4o", override=True)

model = tg.BlackboxLLM("gpt-4o")
question_string = ("If it takes 1 hour to dry 25 shirts under the sun, "
                   "how long will it take to dry 30 shirts under the sun? "
                   "Reason step by step")

question = tg.Variable(question_string,
                       role_description="question to the LLM",
                       requires_grad=False)

answer = model(question)

Define a loss and optimize the answer

Write an evaluation instruction as a TextLoss, then run the backward pass and optimizer step — the same syntax as PyTorch — to revise the answer.

pythonpython

answer.set_role_description("concise and accurate answer to the question")

optimizer = tg.TGD(parameters=[answer])
evaluation_instruction = (f"Here's a question: {question_string}. "
                          "Evaluate any given answer to this question, "
                          "be smart, logical, and very critical. "
                          "Just provide concise feedback.")

loss_fn = tg.TextLoss(evaluation_instruction)

loss = loss_fn(answer)
loss.backward()
optimizer.step()

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Automatically refine an LLM's reasoning or answer when a first response is wrong
Tune system prompts and instructions instead of editing them by hand
Improve generated code or problem solutions through iterative LLM critique
Optimize text variables in research workflows across multiple model providers via litellm

How TextGrad compares

TextGrad alongside other open-source prompt programming tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
DSPy	★ 35.8k	A Stanford framework for programming language models with composable modules and automatic prompt optimization instead of hand-written prompts.
ell	★ 5.9k	A Python library that treats prompts as versioned functions, with tooling to track, visualize, and iterate on them as code.
GEPA	★ 5.5k	A reflective, evolutionary optimizer that improves prompts and other text components of a system using language-model feedback.
LMQL	★ 4.2k	A query language for LLMs that mixes Python control flow with prompts and constraints to script multi-step generation.
AdalFlow	★ 4.2k	A PyTorch-like library for building and auto-optimizing LLM pipelines, tuning prompts across the components of a task.
TextGrad	★ 3.6k	Optimize prompts and text variables with backpropagation through LLM feedback
Mirascope	★ 1.5k	A lightweight Python toolkit for writing LLM calls as typed functions with prompt templates, chaining, and a single interface across providers.

// Overview

// What it does

// Getting started