In plain English
A prompt template is a prompt with blanks in it. You write the parts that never change once — the instructions, the tone, the output format — and leave named holes for the parts that change on every request: the user's question, today's date, a document you fetched. At runtime, your code fills the holes and sends the finished prompt to the model.
The everyday analogy is a mail-merge letter. A company doesn't hand-write "Dear Maria, your order #4821 has shipped" ten thousand times. It writes one letter — "Dear {name}, your order {order_id} has shipped" — and a machine stamps in the details. Prompt templates are exactly that, for model calls. One skeleton, infinite fillings.
If you've only used a chat interface, this might feel abstract: you type a prompt, you get an answer, done. But the moment you build an app on top of a model — a support bot, a summarizer, a code reviewer — you can't hand-type prompts anymore. Every single call your app makes is a template being rendered with fresh data. In practice, almost no production LLM call is a raw, hand-written string.
Why it matters
Templates solve three problems that show up the instant your prompt leaves the playground and enters a codebase.
- Copy-paste drift. Without a template, the "same" prompt gets pasted into five places in your code, each copy slowly mutating. Three months later nobody knows which version is live. One template, imported everywhere, kills this.
- Instructions tangled with data. Naive code glues user input straight into instruction text with
+and string concatenation. Templates force a clean separation: this part is mine (the scaffold), that part is input (the variables). That separation is the foundation for everything downstream — testing, logging, and security. - Untestable prompts. A string built inline from seven concatenations can't be diffed, versioned, or A/B tested. A template is a single artifact you can put in a file, review in a pull request, and version like code.
Who should care: anyone moving from "I prompt ChatGPT" to "I build things that call models." Templates are the bridge. They replaced the early pattern of ad-hoc string concatenation scattered across application code — the LLM equivalent of building HTML pages with "<b>" + name + "</b>", and just as regrettable.
There's also a quieter benefit: a template makes you see your prompt's structure. When the scaffold sits in one file with clearly marked holes, weak instructions and redundant sections jump out in a way they never do when the prompt only exists as fragments at runtime.
How it works
Every templating setup, from a one-line f-string to a full engine, does the same four things:
In a chat-style API you usually have two templates, not one: a template for the system prompt (role, rules, output format — mostly static) and a template for the user message (mostly variables). Keeping them separate mirrors how the API actually wants the conversation structured.
The sophistication ladder
Templating tools form a ladder, and you should climb only as high as your prompt demands:
| Level | Tools | What you get | Good for |
|---|---|---|---|
| Bare interpolation | Python f-strings, JS template literals | Variable substitution, nothing else | One-off scripts, prototypes |
| Safe substitution | string.Template, Mustache | Substitution that can't execute code; logic-less by design | Simple prompts, user-visible templates |
| Full engine | Jinja2, Handlebars | Conditionals, loops, includes, whitespace control | Prompts with optional sections or repeated blocks |
| LLM libraries | LangChain ChatPromptTemplate, prompt registries | Role-aware messages, few-shot helpers, versioning | Multi-message chat prompts in production apps |
Conditionals and loops are what push most real apps past bare interpolation. A retrieval app needs "include this block only if we found documents." A few-shot prompt needs "repeat this example block once per example." That's if and for — exactly what template engines were built for, decades before LLMs existed.
Build one in Python
Here's the same prompt at two rungs of the ladder. First, the naive version most people start with:
def support_prompt(product: str, user_message: str) -> str:
return f"""You are a support assistant for {product}.
Answer briefly. If you don't know, say so.
Customer message:
{user_message}
"""
print(support_prompt("Acme Cloud", "Where's my refund?"))This works until it doesn't. Want to include retrieved documents only when you have them? Want to ask for JSON output, which means literal { braces inside an f-string (hello, doubled {{ }} escapes everywhere)? Misspell a variable and Python raises at call time — if you're lucky. Time for an engine. Jinja2 (pip install jinja2) is the standard pick in Python:
from jinja2 import Environment, StrictUndefined
# StrictUndefined: a missing variable raises an error
# instead of silently rendering as blank text.
env = Environment(undefined=StrictUndefined, trim_blocks=True, lstrip_blocks=True)
TEMPLATE = env.from_string("""\
You are a support assistant for {{ product }}.
Answer briefly. If you don't know, say so. Do not guess.
{% if docs %}
Answer using only these documents:
{% for doc in docs %}
<doc id="{{ loop.index }}">
{{ doc }}
</doc>
{% endfor %}
{% endif %}
Customer message:
<message>
{{ user_message }}
</message>
""")
prompt = TEMPLATE.render(
product="Acme Cloud",
docs=["Refunds are processed within 5-7 business days."],
user_message="Where's my refund?",
)
print(prompt)Three things to notice. The {% if docs %} block disappears entirely when there are no documents — no dangling "Answer using only these documents:" header above empty space. The {% for %} loop scales from one document to fifty without touching the template. And StrictUndefined turns a forgotten variable into a loud crash instead of a quietly broken prompt — the single most valuable line in the file.
Pitfalls of naive interpolation
Most template bugs aren't exotic. They're the same five mistakes, made by everyone, usually discovered in production logs:
- Silent blanks. With
dict.get()defaults or Jinja's default undefined behavior, a missing variable renders as empty string. The model receives "Summarize the following:" followed by nothing — and cheerfully hallucinates a summary of nothing. Fail loudly on missing variables, always. - Dangling headers. Conditional content with an unconditional label:
Context: {context}where context is sometimes empty. The model sees a header promising information that never arrives, and some models will invent it. Make the whole block conditional, label included. - Brace collisions. F-strings and
str.format()treat{as syntax. The moment your prompt contains a JSON example, you're escaping every brace as{{ }}and the template becomes unreadable. Engines with distinct delimiters ({{ var }}vs literal text) sidestep this. - Whitespace mangling. Triple-quoted strings inherit your code's indentation; template
ifblocks leave stray blank lines. Models tolerate some noise, but messy rendering makes prompts hard to debug. Jinja'strim_blocksandlstrip_blocksoptions exist precisely for this. - Treating substitution as sanitization. It isn't. A template drops user text into your prompt verbatim — if that text says "ignore your instructions," it lands right next to your instructions.
Going deeper
Chat templates inside the model stack
Templates go deeper than your application layer. Open-weight chat models ship with a chat template baked into their tokenizer config — and it's literally a Jinja template. Hugging Face's apply_chat_template runs it to convert your list of role-tagged messages into the exact token sequence the model was trained on, special tokens and all. Get this formatting wrong when self-hosting and quality craters, because the model is seeing a conversation format it never saw in training. So even when you call a hosted API "without templates," there's a template under the floorboards.
Template order and prompt caching
Major providers cache long, repeated prompt prefixes to cut cost and latency — but caching only works on bytes that are identical across calls, starting from the top. That makes variable placement an economic decision: put the static scaffold (instructions, examples, schemas) first and the per-request variables last. A template with a timestamp on line one busts the cache on every single call. Design templates so everything stable is a contiguous prefix.
How much logic belongs in a template?
Mustache's whole philosophy is "logic-less": templates may substitute and loop, but never compute. There's wisdom in that for prompts. Every if branch in a template doubles the number of distinct prompts your app can produce — three conditionals means eight variants, each of which can regress independently and each of which your evals must cover. The pragmatic line: substitution, loops, and presence/absence checks live in the template; any real decision (which examples to include, how to truncate a document) lives in code, where you can unit-test it. Rendering should be a pure function: same variables in, same string out, snapshot-tested in CI.
Templates as deployable artifacts
At scale, teams pull templates out of the codebase entirely and into a prompt registry (LangSmith, Langfuse, and PromptLayer all offer one). The template becomes a versioned, deployable artifact: prompt v14 is pinned in production while v15 runs in an A/B test, and a bad prompt gets rolled back without a code deploy. The trade-off is operational — your app now has a runtime dependency on the registry, and a template fetched at runtime is one more thing that can fail. Most teams start with templates in git and graduate to a registry when non-engineers need to edit prompts.
FAQ
What is the difference between a prompt and a prompt template?
A prompt is the finished text a model actually receives on one specific call. A prompt template is the reusable recipe for producing it: fixed instructions plus named variables. Render the template with concrete values and you get a prompt. One template produces thousands of different prompts.
Should I use f-strings or Jinja for prompt templates?
F-strings are fine for prototypes with one or two variables and no optional sections. Switch to Jinja (or any real engine) the moment you need conditionals, loops over examples or documents, literal braces (JSON in prompts), or templates stored in separate files. The switch costs minutes and removes a whole class of escaping and whitespace bugs.
How do I handle optional sections in a prompt template?
Wrap the entire block — label included — in a conditional, e.g. Jinja's {% if docs %}...{% endif %}. The classic mistake is keeping the header unconditional ("Context:") while the content is optional, leaving the model a heading that points at nothing, which invites hallucination.
Can prompt templates cause prompt injection?
Templates don't cause injection, but they're where it enters: every variable filled with user-controlled or web-fetched text is an injection surface, and substitution does zero sanitization. Wrap untrusted variables in clear delimiters, keep instructions in the system message, and treat delimiters as a mitigation rather than a guarantee.
Where should I store prompt templates — in code or separate files?
Separate files in the same repo is the sweet spot for most teams: clean diffs, syntax highlighting, and prompt changes reviewed like code changes. Inline strings are acceptable for tiny prompts; a hosted prompt registry (LangSmith, Langfuse, PromptLayer) makes sense once non-engineers edit prompts or you need rollback without a deploy.
Why does my templated prompt render with missing or blank variables?
Most engines default to rendering missing variables as empty strings instead of erroring. In Jinja, construct your Environment with undefined=StrictUndefined so a forgotten variable raises immediately; in plain Python, prefer direct f-string references over dict.get() with silent defaults. A loud crash at render time beats a quietly broken prompt in production.