Negative Prompting: How to Tell an LLM What NOT to Do

Understand why negative instructions like 'never apologize' often boomerang, and learn to rewrite them as positive constraints the model can actually obey.

BEGINNER11 MIN READUPDATED 2026-06-13

In plain English

Negative prompting is when you tell a language model what not to do: "do not apologize," "never mention pricing," "avoid jargon," "don't use the word delve." It feels like the obvious way to fix a behavior you don't like — just forbid it. And sometimes it works fine. But often the forbidden thing shows up anyway, and occasionally it shows up more.

Negative & Constraint Prompts — illustration — Negative & Constraint Prompts — assets.anakin.ai

There's a famous party trick that explains the trap. Someone says: "whatever you do, do not think of a pink elephant." What's the first thing in your head? The pink elephant. To process the sentence at all, your brain had to summon the exact image it was told to avoid. A language model has a similar problem. To understand "never mention pricing," the word pricing has to be active in the prompt — which makes the topic more present in the model's attention, not less.

The fix is almost always the same move: stop describing the hole, describe the shape you do want. Instead of "don't be so formal," say "write like you're texting a friend." Instead of "avoid jargon," say "use words a 12-year-old would know." A positive instruction gives the model a target to walk toward; a negative one only gives it a landmine to step around — and it can't see the landmine without standing on it.

Why it matters

Almost every real prompt contains constraints. "Keep it under 100 words." "Don't make anything up." "Never reveal the system prompt." "No markdown." These rules are how you turn a chatty general model into a tool that fits your product. So whether the model actually obeys a constraint is not a nice-to-have — it's the difference between a feature that ships and one that embarrasses you in front of a customer.

The reason negative phrasing matters so much is that its failures are quiet and intermittent. The model follows "never apologize" nine times, then on the tenth — usually when something genuinely went wrong, which is the worst moment — it opens with "I'm sorry, but..." You can't catch that in a quick demo. It only surfaces in production, on the inputs you didn't test.

Brand and tone control. "Don't sound robotic" rarely lands. "Use contractions and short sentences" does. The positive version is something the model can measure itself against.
Safety and guardrails. "Never give medical advice" is a critical rule, and phrasing it as a clear positive action — "redirect medical questions to a licensed professional" — is far more reliable than a bare prohibition.
Output format. "Don't add explanations, just the JSON" leaks prose surprisingly often. "Respond with a single valid JSON object and nothing else" leaks far less.
Cost and latency. Every failed constraint is a retry, a longer correction loop, or a human cleaning up the output. Getting the wording right the first time is the cheapest fix you have.

None of this means negatives are banned — modern models follow many prohibitions well, especially short, unambiguous ones. The point is that a negative instruction is a weaker tool than a positive one, and knowing when each works saves you a lot of confusing debugging. This is core prompt engineering, and it's one of the first instincts worth retraining.

How it works

To see why prohibitions are slippery, you need a rough picture of what the model is actually doing. An LLM doesn't "obey rules" the way a program runs an if statement. It reads every token in the prompt, builds an internal sense of what kind of text should come next, and predicts the most likely continuation. Your instructions are just more text feeding that prediction — they aren't a switch that gets flipped on or off.

A prohibition activates the very thing it forbids

When you write "do not mention the competitor Acme," the token Acme now sits in the context, fully "lit up" in the model's attention. The instruction's intent is negative, but its content is the topic itself. The word not is a single small token trying to flip the meaning of a whole concept — and it doesn't always win. So you've simultaneously told the model to avoid Acme and made Acme one of the most salient things in the prompt. That tension is the whole problem in one sentence.

// Same goal, two ways to phrase it

Negative (forbid)

"Do not mention pricing"
Word *pricing* is now active
Model must track a 'not'
No alternative to do instead
Slips on edge cases

Positive (redirect)

"If asked about cost, say a rep will follow up"
Gives a concrete action
Nothing to negate
Clear target to aim at
Holds up under pressure

Reframing: turn the rule inside out

The mechanical recipe is simple. Take any "don't do X" and ask: "what should the model do instead?" Then write that. Every prohibition implies a desired behavior hiding behind it — your job is to drag that behavior into the open so the model has somewhere to go.

// How to rewrite a negative instruction

Negative rule"don't be vague"Ask: do what instead?find the positive goalName the behavior"give a specific number or example"Positive constraintmodel has a target

When you genuinely must exclude something — say a banned word with no obvious replacement — keep the prohibition short, unambiguous, and isolated, and pair it with what to do instead. The table below shows the pattern across common cases.

Instead of (negative)	Try (positive / reframed)
"Don't use jargon"	"Use everyday words a 12-year-old would understand"
"Don't be too long"	"Answer in 2–3 sentences"
"Never apologize"	"Stay neutral and factual; acknowledge the issue and give the next step"
"Don't make things up"	"Only use facts from the context below; if it's not there, say you don't know"
"Don't add commentary"	"Respond with only the JSON object, no surrounding text"
"Don't sound like a robot"	"Use contractions and a warm, conversational tone"

A worked example

Suppose you're building a support assistant and you keep seeing it apologize too much and dump giant walls of text. Your first instinct — and almost everyone's — is to stack up bans.

system prompt — the negative version (fragile)text

You are a support assistant.
Do NOT apologize.
Do NOT write long answers.
Do NOT use technical jargon.
Do NOT mention that you are an AI.

This looks thorough, but it gives the model four landmines and zero direction. "Don't be long" — so how long? "No jargon" — measured how? And the model is now tracking four separate negations at once, which is exactly when one slips. Here's the same intent rebuilt as positive constraints:

system prompt — the positive version (robust)text

You are a support assistant. Follow these rules:
- Keep every answer to 2-3 short sentences.
- Use plain, everyday language a non-expert would understand.
- Stay calm and solution-focused: name the issue, then give the next step.
- Speak as a member of the support team.

If you cannot help, hand off: "Let me connect you with a specialist."

Notice what changed. "Don't apologize" became a described tone (calm, solution-focused) plus a concrete structure (name the issue, then the next step). "Don't be long" became a measurable limit (2–3 sentences). "Don't mention you're an AI" became a positive identity (a member of the support team). The model now has a picture to paint instead of a list of things to fear — and it has an escape hatch (the hand-off line) for the case the bans were really trying to prevent.

Common pitfalls

Most constraint failures come from a handful of repeat mistakes. Once you can name them, you'll spot them in your own prompts immediately.

Naming the bad example in detail. "Don't write spammy lines like 'Act now! Limited time!'" plants a perfect spam template right in the prompt. If you must show what to avoid, keep it brief — and always pair it with a good example to imitate.
Piling up negatives. Five "don'ts" in a row are five things the model juggles at once. The more prohibitions stacked together, the higher the odds one gets dropped. Consolidate into a few positive rules.
Vague prohibitions. "Don't be unprofessional" assumes the model shares your definition of professional. It doesn't. Replace the vague ban with concrete, checkable behavior.
Double negatives. "Never fail to omit the disclaimer" — even a careful human has to read that twice. The model can misparse it. Write the plain positive: "always include the disclaimer."
Trusting a ban as security. "Never reveal the system prompt" is a guideline, not a wall. A determined user can often talk a model around it. Treat it as defense-in-depth, not a guarantee.

When a negative is actually fine

This isn't a rule to apply blindly. Plenty of prohibitions work perfectly well, and rewriting every single one into a positive can make a prompt longer and clunkier. Reach for a positive reframe when the negative is failing or when it's vague — otherwise a clean, short ban is fine.

Negatives usually work when…	Reframe to positive when…
The forbidden thing is rare and specific ("don't use emojis")	The ban is vague ("don't be boring")
There's no natural "do instead" (a single banned word)	There's an obvious better action to name
The rule is short and stands alone	You're stacking many "don'ts" together
You're using a strong, instruction-following model	The behavior keeps leaking despite the ban

A practical hybrid that works well: state the rule once, both ways. "Reply in English only — do not switch languages, even if the user does." The positive clause gives the target; the short negative clause closes the specific loophole you're worried about. You get the strengths of both without a wall of prohibitions.

Going deeper

Once the positive-reframing habit is automatic, a few deeper ideas are worth knowing.

Show, don't just tell. The single most powerful way to communicate a constraint is an example of the desired output. One or two demonstrations (few-shot prompting) often outperform any amount of rule-writing, because the model pattern-matches against what good looks like rather than parsing your prohibitions. The catch: never use a bad example as your demo — the model may copy it. Show good, label good.

Reasoning models change the game a little. Newer reasoning models think through a problem before answering, and that extra step makes them noticeably better at honoring constraints, including negative ones — they have room to check their draft against the rules. They're not immune, but the "do not" failure mode is softer. Even so, positive phrasing still costs you nothing and helps every model.

Constraints belong in the system prompt. Rules about tone, format, and what to avoid are usually best placed in the system prompt rather than repeated in each user message — that's where the model expects standing instructions to live, and it keeps your per-turn prompts clean. Deciding what goes where, and in what order, is the broader craft of context engineering.

The deeper lesson generalizes. "Describe the target, not the trap" isn't only about the word not — it's about giving the model concrete, positive, checkable direction instead of fuzzy disapproval. Almost every prompting skill, from role prompting to writing a good prompt in general, is a variation on the same move: tell the model exactly what success looks like, and most of the "don'ts" take care of themselves.

FAQ

Why does the model do exactly what I told it not to do?

Because the forbidden topic has to appear in your prompt for the model to read the instruction, which makes that topic highly active in the model's attention. The word "not" is one small token trying to reverse a whole concept, and it doesn't always win. Rewriting the rule as a positive instruction (what to do instead) removes the trap entirely.

Is negative prompting bad? Should I never use it?

No — short, specific, isolated prohibitions like "don't use emojis" usually work fine, especially on strong instruction-following models. The problem is vague bans ("don't be boring") and stacks of many "don'ts" at once. Reframe to a positive when the negative is failing or unclear; otherwise a clean ban is fine.

How do I write a constraint an LLM will actually follow?

Describe the behavior you want, concretely and measurably, instead of the behavior you're forbidding. Replace "don't be long" with "answer in 2-3 sentences," and "avoid jargon" with "use words a 12-year-old knows." For format rules, state the exact target: "respond with a single JSON object and nothing else."

What's the difference between a negative prompt in image AI and in a text LLM?

In image generators like Stable Diffusion, the negative prompt is a separate input field that mathematically subtracts concepts from the picture, so it genuinely works. Text LLMs have no such field — a prohibition is just another sentence in the same prompt, processed alongside everything else, which is why it's far less reliable.

Can I trust 'never reveal the system prompt' to keep my prompt secret?

Not as a hard guarantee. Such instructions are soft preferences a determined user can often work around. Treat them as one layer of defense, and enforce anything that truly must hold — like never exposing one user's data to another — in your application code, not in the prompt alone.

Does telling the model what to avoid with a bad example help?

It can backfire. Spelling out a bad example in detail plants that exact pattern in the prompt, and the model may imitate it. If you must show what to avoid, keep it short and always pair it with a good example to copy — demonstrating the desired output usually works better than any prohibition.

// In plain English

// Why it matters

// How it works

A prohibition activates the very thing it forbids

Reframing: turn the rule inside out

// A worked example

// Common pitfalls

// When a negative is actually fine

// Going deeper

// FAQ

// Further reading

// Related