AI/TLDR

LMQL

A query language that blends Python control flow with LLM prompts and output constraints

Overview

LMQL is a programming language for large language models built as a superset of Python. Instead of stitching prompts together with string templates, you write code that reads like normal Python while top-level strings are sent to a model, and template variables such as [GREETINGS] are filled in by the LLM during execution.

It is aimed at developers who need more than a single prompt: people building multi-step generation, structured output, or agent-style logic where the program decides what to ask the model next. It sits in the prompt-programming corner of LLM orchestration, alongside templating and chaining tools, but pushes LLM calls down to the language level.

A key idea is the where keyword, which lets you attach constraints to generated text such as stopping phrases, length limits, or data types. LMQL also offers several decoding strategies (argmax, sample, beam search, best_k) and works with OpenAI, Azure OpenAI, and Hugging Face Transformers models.

What it does

  • Python-based syntax: queries are written in a superset of Python, so classes, variable captures, and control flow all work natively
  • Output constraints via the where keyword: limit length, enforce stopping phrases, character-level rules, and data types using logit masking
  • Multiple decoding algorithms including argmax, sample, beam search, and best_k
  • Sync and async APIs that can run many queries in parallel with cross-query batching
  • Multi-model support for OpenAI API, Azure OpenAI, and Hugging Face Transformers
  • Integrations with LangChain and LlamaIndex, plus a browser-based Playground IDE and a VS Code extension

Getting started

LMQL needs Python 3.10. Install it with pip, then launch the Playground IDE or write a query in Python.

Install LMQL

Install the latest release with pip. Python 3.10 must be available.

bashbash
pip install lmql

Add local GPU support (optional)

To run models on a local GPU, install in an environment with a GPU-enabled PyTorch >= 1.11 and use the hf extra.

bashbash
pip install lmql[hf]

Launch the Playground IDE

After installing, start the interactive Playground to write and run queries in the browser. This requires Node.js to be installed.

bashbash
lmql playground

Write a query

An LMQL program reads like Python; top-level strings are sent to the model and bracketed variables are completed by it. The where clause constrains the output.

pythonpython
"Greet LMQL:[GREETINGS]\n" where stops_at(GREETINGS, ".") and not "\n" in GREETINGS

"To summarize:[SUMMARY]"

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Scripting multi-step generation where program logic decides the next prompt based on earlier model output
  • Producing structured or schema-safe output (for example JSON) by constraining the model with the where clause
  • Running hundreds of queries in parallel with the async API and cross-query batching
  • Prototyping prompting strategies interactively in the Playground IDE before wiring them into a LangChain or LlamaIndex stack

How LMQL compares

LMQL alongside other open-source prompt programming tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
DSPy★ 35.8kA Stanford framework for programming language models with composable modules and automatic prompt optimization instead of hand-written prompts.
ell★ 5.9kA Python library that treats prompts as versioned functions, with tooling to track, visualize, and iterate on them as code.
GEPA★ 5.5kA reflective, evolutionary optimizer that improves prompts and other text components of a system using language-model feedback.
LMQL★ 4.2kA query language that blends Python control flow with LLM prompts and output constraints
AdalFlow★ 4.2kA PyTorch-like library for building and auto-optimizing LLM pipelines, tuning prompts across the components of a task.
TextGrad★ 3.6kA library that optimizes prompts and other text variables using textual gradients, applying a backpropagation-like loop driven by LLM feedback.
Mirascope★ 1.5kA lightweight Python toolkit for writing LLM calls as typed functions with prompt templates, chaining, and a single interface across providers.