In plain English
Most people blame the model when they get a bad answer. Nine times out of ten, the problem is the prompt. A language model is not a mind-reader — it predicts the most plausible continuation of whatever text you give it. Feed it vague text, and it produces vague output. Feed it conflicting instructions, and it picks one at random. Feed it nothing about your situation, and it invents a situation that fits.
Think of it like giving directions to a stranger. "Go to the nice coffee shop" fails because nice is subjective and the stranger doesn't know your city. "Walk two blocks north, turn left at the pharmacy, and it's the green building on the corner" succeeds. Both are requests for the same thing. The difference is precision and context — and that is exactly what separates a good prompt from a bad one.
The ten mistakes below cover the overwhelming majority of cases where beginners get disappointing output. Each one has a recognisable symptom and a one-sentence fix. Master these ten, and you will find yourself iterating far less — and blaming the model far less too.
Why it matters
Prompting is the only lever most users have over a language model. You cannot retrain it, adjust its weights, or change its architecture. What you can do is control the exact text it receives — and that turns out to matter enormously. Small wording changes can shift output quality by more than switching to a more expensive model.
The cost of bad prompts is not just bad answers in one conversation. In production applications — chatbots, summarisers, code assistants — a flawed prompt runs thousands of times a day, silently producing mediocre output at scale. Teams spend weeks trying new models or tweaking temperature when the real fix was a prompt that took ten minutes to rewrite.
Learning to recognise these failure patterns is the fastest skill upgrade available to anyone building with LLMs. It does not require code, does not require a PhD, and pays off immediately in better answers, fewer follow-ups, and more predictable behaviour.
How the ten mistakes work
Every prompting mistake maps to one of three root causes: the model lacks information, the model receives conflicting signals, or the model's output is left unconstrained. The diagram below shows these three buckets and which of the ten mistakes falls into each:
Knowing which bucket your prompt falls into narrows the fix. If the output is vague or off-topic, you are probably missing information. If the model ignores an instruction or oscillates between styles, you likely have conflicting signals. If the answer is the right content but the wrong shape, your output is unconstrained.
The ten mistakes are not equally common. Mistakes 1–3 (vagueness, missing context, no format) account for the large majority of beginner frustration. Work through those first before optimising for the subtler issues at the bottom of the list.
The ten mistakes — and their fixes
1. Asking vaguely
Symptom: The output is generic, surface-level, or not what you pictured. Example: "Write something about climate change." The model picks a random angle, length, and tone because none were specified.
Fix: Replace subjective adjectives with concrete nouns and numbers. Instead of "write something," write "write a 200-word explainer for a 12-year-old covering the greenhouse effect and one practical action they can take." Length, audience, topic, and call-to-action are now unambiguous.
2. Leaving out context
Symptom: The model's answer would be correct in some imaginary situation but not yours. Example: "Summarise this for my team" — the model doesn't know if your team is engineers, executives, or kindergarteners.
Fix: Provide the who, what, and why before the task. A reliable formula: "I am [role]. I am working on [project]. My audience is [description]. Please [task]." The model uses this framing to calibrate every word of its response.
3. Not specifying output format
Symptom: The model writes flowing prose when you needed a bullet list, or a list when you needed a table, or plain text when you needed JSON. The content is right; the shape is useless.
Fix: Describe the output format explicitly at the end of your prompt. "Return your answer as a JSON array of objects, each with keys title, date, and summary." If you want Markdown headings, say so. If you want no Markdown at all, say that too.
4. Burying the instruction in the middle
Symptom: The model ignores a specific instruction that is somewhere in a long prompt — especially a constraint or a format rule. This is the "lost in the middle" effect: research from Stanford and UC Berkeley found a U-shaped attention pattern, where models attend strongly to the start and end of a context but poorly to material in the centre.
Fix: Put the task instruction on line one. Place long background text (documents, conversation history, data dumps) after the instruction. If a rule is critical, repeat it at the end as well.
5. Using negatives instead of positives
Symptom: You tell the model "do not use bullet points" and it produces bullet points anyway, or it over-compensates and strips structure you wanted. Researchers have named this the pink elephant problem — asking someone not to think about something makes it more salient, and the same pattern appears in LLM instruction following.
Fix: Reframe negatives as positives. "Do not use bullet points" becomes "use continuous paragraphs." "Don't be too formal" becomes "write in a friendly, conversational tone." Positive instructions describe a target state; negative instructions describe everything except a target state.
6. Giving contradictory instructions
Symptom: The output mixes styles unpredictably — formal in one paragraph, casual in the next — or applies one rule and silently drops a conflicting one.
Fix: Read your own prompt before sending it and ask: do any two instructions conflict? Common collisions: "be concise" plus "be thorough," "avoid jargon" plus "use industry terms," "short answer" plus a long list of things to cover. Resolve conflicts explicitly: "Be thorough on causes, but give the solution in a single sentence."
7. Applying chain-of-thought to models that don't need it
Symptom: You prepend "Think step by step" to every request, and reasoning-native models (like OpenAI o3, o4-mini, or Claude 3.7 Sonnet with extended thinking enabled) become verbose and slow without quality improvement — or worse, the visible reasoning chain confuses the output.
Fix: Reasoning models already think step by step internally — instructing them to do so again in the visible output adds noise. Save chain-of-thought instructions for standard models (GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro) where visible reasoning genuinely improves accuracy. Check your model's documentation.
8. Ignoring the system prompt
Symptom: You paste a massive blob of instructions at the start of every user message, the model sometimes ignores parts of them, and your chat history is noisy.
Fix: Move persistent, session-wide instructions (persona, rules, format defaults) into the system prompt. Use the user turn for the actual task. System-prompt instructions carry more consistent weight across turns because they are set once and the model treats them as the operating context rather than content to summarise.
9. Skipping examples when the task is nuanced
Symptom: You describe a task carefully in words, but the output tone or structure is still slightly off — the model's interpretation of "like the style of a legal brief" differs from yours.
Fix: Show, don't just tell. Add one or two input/output examples (few-shot prompting). A concrete example of what you want outweighs a paragraph of description. Even a single example anchors the model's interpretation of ambiguous style or format instructions far more reliably than adjectives can.
10. Not iterating — or iterating randomly
Symptom: After one bad output, you rewrite the whole prompt from scratch, or add more and more text hoping to fix it, with no record of what changed or why.
Fix: Change one thing at a time. Keep a copy of the old prompt. Compare the outputs side by side. When you find a change that helps, keep it; when it doesn't, revert it. This is the discipline of systematic prompt iteration — the same logic as controlled experiments — and it is the only way to learn what your specific model actually responds to.
Before and after: three rewrites
The quickest way to internalise these fixes is to see them applied. Each example below shows a weak prompt, names the mistakes it contains, and shows the fixed version:
| Weak prompt | Mistakes present | Fixed prompt |
|---|---|---|
| Explain machine learning. | Vague (1), no format (3), no context (2) | Explain machine learning in 3 short paragraphs for a business analyst who has never written code. Use plain English and no maths. |
| Don't make it too long, be comprehensive, and don't use jargon but use technical accuracy. | Contradictory instructions (6), negatives (5) | Write 150–200 words. Cover every required point. Use plain English; define any technical term the first time you use it. |
| Here is a 5,000-word document. Answer my question: what is the main argument? Also, be concise. | Instruction buried (4), conflicting (6) | What is the main argument of the document below? Answer in one sentence. [document] |
Going deeper
The ten mistakes are symptoms of one meta-mistake: treating the model as a person. People fill in gaps with social knowledge, shared history, and common sense. Language models fill in gaps with statistical patterns from training data — which may or may not match your situation. The mental shift from "talking to a clever colleague" to "writing a precise specification for a powerful but context-free text machine" changes how you write prompts immediately and permanently.
Prompt sensitivity is real and asymmetric. Research consistently shows that small surface changes — adding a period, rephrasing a constraint, changing word order — produce disproportionately large output changes. This is not a bug you can work around; it is a fundamental property of autoregressive models. The practical response is to maintain a small test set of representative inputs and re-run them when you edit a prompt, so you notice when a "minor tweak" silently changed behaviour across the board.
Aggressive formatting in instructions can backfire. Caps-lock rules like "YOU MUST NEVER" or "ALWAYS FOLLOW THIS EXACTLY" do not reliably increase compliance; in many models they trigger over-refusal or produce stilted, over-hedged output. Calm, declarative phrasing works better: "The response must be in JSON." "Use a professional tone throughout." Reserve emphasis for the single most critical constraint, and only use it once.
Beyond the ten: structured prompting patterns. Once you have eliminated the ten mistakes, the next level of reliability comes from formal prompt structure — wrapping sections in XML tags (<instructions>, <context>, <examples>), using role prompting to anchor tone, and learning the difference between zero-shot, one-shot, and few-shot prompting for different task types. These patterns move you from debugging individual prompts to designing reusable, testable prompt templates.
Version your prompts. The most overlooked practice for anyone who iterates seriously: keep every version of a prompt, with a note about what changed and why. Without a log, you cannot distinguish a genuine improvement from a fluke. With a log, every bad output is a data point that makes the next version better. Prompt management — the discipline of versioning and testing prompts like code — is where systematic prompt engineering begins.
FAQ
Why does the model keep ignoring my instructions?
The most common causes are buried instructions (placed in the middle of a long prompt where attention is weakest), contradictory instructions (the model picks one and silently drops the other), and instructions in the user turn rather than the system prompt. Move critical rules to the beginning and end of your prompt, resolve contradictions, and put session-wide rules in the system prompt.
Why are my LLM outputs so generic even when I think my prompt is detailed?
Detail and specificity are not the same thing. A long prompt full of adjectives like "thorough," "professional," and "engaging" is still vague — the model has no concrete target to hit. Replace adjectives with measurable constraints: word count, structure, audience reading level, required topics, and a concrete example of the desired output.
Does adding more text to a prompt always make it better?
No. Extra text adds noise alongside the signal, increases the chance of contradictions, and pushes critical instructions toward the middle of the context where models attend to them less reliably. Shorter prompts that are specific and concrete routinely outperform longer ones that are verbose and vague.
Why does the model add disclaimers and caveats I didn't ask for?
Models are trained to add safety and hedging language by default. You can suppress this explicitly: "Do not add disclaimers, caveats, or suggestions to seek professional advice unless I ask for them." Positive framing helps too: instead of "don't hedge," try "answer directly and concisely as if speaking to a knowledgeable adult."
Is it better to use "do not" instructions or rephrase them positively?
Positive instructions are more reliable. "Do not use bullet points" makes the model think about bullet points and try to avoid them; "write in continuous prose" gives the model a concrete target. Research on the so-called pink elephant problem in LLM instruction following confirms that positive descriptions of the desired state outperform negative constraints, especially for style and format.
Do these prompting mistakes apply to all models, or just one?
All of them. The ten failure patterns — vagueness, missing context, absent format, buried instructions, negative framing, contradictions, wrong reasoning technique, system-prompt neglect, no examples, and random iteration — are universal properties of how autoregressive language models work. The specific phrasing of fixes may vary slightly by model, but the root causes do not.