In plain English
A foundation model is a massive neural network trained on a huge chunk of the internet — books, code, articles, forum posts — for the sole purpose of learning to predict the next token in a sequence. It absorbs patterns from trillions of words without being told what any particular piece of text means or what it should do with it. The result is a model that has compressed an enormous slice of human knowledge into its weights, like a very well-read student who has never been taught how to answer exam questions politely.

That raw, pre-trained model is called the base model (sometimes the pretrained model). It's useful, but it's unpredictable. Ask it a question and it might complete your sentence sideways, list three more questions, or launch into an essay — because every one of those was a plausible continuation of text it saw during training.
Think of it like a new hire who has read every book in the company library but has never worked a day in the office. They know an enormous amount. They just haven't learned how to respond to a manager's question yet.
Labs then run a second round of training that teaches the model to follow instructions. This produces an instruct model — the same underlying knowledge, but now reliably packaged into direct, task-following responses. The instruct model learned the office norms. Ask it "summarize this contract" and it summarizes; it doesn't start reciting the contract's header.
A chat model takes that one step further by optimizing for multi-turn dialogue — remembering what was said earlier in a conversation, handling follow-up questions, and maintaining a consistent persona across many exchanges. In practice, the distinction between "instruct" and "chat" has blurred considerably; most modern releases that use either label are trained on both instruction-following data and conversational dialogue.
Why it matters
If you call an API like the Anthropic API or the OpenAI API, you are almost certainly talking to an instruct or chat model, not a raw base model. Labs rarely expose raw base models through their flagship APIs, because base models require careful prompt engineering just to behave sensibly for end users.
But the distinction matters a lot when you start building:
- Fine-tuning on top of a base model gives you maximum flexibility — the model has no prior opinions about how to respond, so your custom training data shapes behavior from scratch. This is the approach for domain-specific models that need to speak in a very specific voice or follow a non-standard output format.
- Fine-tuning on top of an instruct model is a much lighter lift. The instruction-following capability is already baked in; you only need to teach the content of your domain, not how to behave. Usually 300–1,000 high-quality examples is enough.
- Calling a chat model directly — with no fine-tuning, using system prompts and few-shot examples in context — is how most production applications start. It's the fastest path to something useful.
Knowing which variant you have also sets expectations. A base model completing a sentence is not "wrong" when it adds a paragraph of unrelated text — that's what a text completer does. An instruct model ignoring your format instructions is something worth debugging.
How it works
Every model goes through at least two phases. Most production models go through three.
Phase 1: Pre-training
The model starts with randomly initialized weights and is shown an enormous corpus — typically trillions of tokens scraped from the web, books, code repositories, and academic papers. At every step, the model predicts the next token; the difference between its prediction and the correct answer is used to nudge the weights slightly in the right direction. Repeat billions of times. The model never sees explicit labels like "this sentence is positive" or "this is a Python function" — the supervision signal comes entirely from the text itself, which is why this phase is called self-supervised learning.
What emerges is a base model: an extraordinarily capable text completer that has implicitly learned grammar, factual associations, coding patterns, reasoning chains, and much else — all as a side effect of getting good at next-token prediction.
Phase 2: Supervised fine-tuning (SFT)
A small, carefully curated dataset of (instruction, ideal response) pairs is assembled — sometimes tens of thousands, sometimes hundreds of thousands of examples. Human writers craft examples like: instruction = "Translate this paragraph to French", ideal response = the French translation. The model is then fine-tuned on these pairs using the same next-token prediction objective.
After SFT, the model has internalized the shape of helpful responses: it knows to answer the question directly, format code in a code block, and stop when it's done rather than rambling on. This is what creates an instruct model.
Phase 3: Alignment (RLHF or DPO)
SFT teaches the model what kind of responses exist; alignment training teaches it which responses humans actually prefer. In Reinforcement Learning from Human Feedback (RLHF), human raters compare pairs of model outputs and pick the better one. Those preferences train a reward model, which scores responses. The main model is then updated via reinforcement learning to score higher on that reward signal.
Direct Preference Optimization (DPO) is a newer alternative that skips the separate reward model and optimizes the language model directly on the preference data. DPO reduces training complexity and compute cost by 40–75% compared to full RLHF while reaching comparable alignment quality.
The output of this phase is the aligned chat model: helpful, honest, and much less likely to produce harmful content. This is what you get when you call claude-sonnet-4-6 or gpt-5.5 through an API.
How APIs expose these variants
Different labs make different choices about which variants they publish. Understanding their naming patterns saves confusion when you're picking a model.
| Lab / Model Family | Base model available? | Instruct / Chat model | Notes |
|---|---|---|---|
| Meta Llama 4 | Yes — base checkpoint (no suffix) | the matching -Instruct variant | Both released openly on Hugging Face |
| Mistral | Yes — Mistral-7B-v0.1 | Mistral-7B-Instruct-v0.2 | Both on Hugging Face; instruct adds [INST] chat template |
| OpenAI GPT-5 | No | gpt-5.5 (chat/instruct) | Only aligned model exposed via API |
| Anthropic Claude | No | claude-* (all aligned) | No public base model; API always chat-aligned |
| Google Gemma 3 | Yes — gemma-3-9b-pt | gemma-3-9b-it | pt = pretrained base; it = instruction-tuned |
The pattern is clear: closed-source labs (OpenAI, Anthropic, Google Gemini) only expose aligned models through their APIs. Open-weight labs (Meta, Mistral, Google Gemma) release both, giving developers the choice to fine-tune from the base or build on top of the instruct variant.
Which variant should you use?
The answer depends on what you're building and how much control you need over the model's behavior.
When to pick instruct over base for fine-tuning
If your fine-tuning dataset has fewer than a few thousand examples, start from the instruct model. The instruction-following behavior is already baked in; your training data only needs to teach what to say, not how to say it well. Starting from the base model with limited data often produces a model that regresses on general instruction-following while gaining only modest domain knowledge.
The main reason to start from a base model is when your application genuinely needs behavior that conflicts with the instruct model's alignment — for example, generating synthetic data for red-teaming, building a completion engine with custom turn-taking, or a highly specialized format where the instruct model's formatting habits interfere.
System prompts vs fine-tuning
Before reaching for fine-tuning at all, try a system prompt. Modern aligned models respect detailed instructions in the system prompt with surprising fidelity. Write a system prompt that specifies the persona, output format, tone, and constraints. Test it with 20–30 representative inputs. Fine-tuning is expensive and slow to iterate; a prompt change takes seconds. Only move to fine-tuning when you've established that the system prompt ceiling is genuinely too low for your use case.
import anthropic
client = anthropic.Anthropic()
# All Claude API calls reach an aligned chat model — never a raw base model.
# The system prompt shapes behavior; the model already knows how to follow instructions.
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=512,
system="You are a concise technical writer. Reply in bullet points only. No prose.",
messages=[
{"role": "user", "content": "What is the difference between base and instruct models?"}
]
)
print(response.content[0].text)Going deeper
The base / instruct / chat taxonomy is tidy, but real training pipelines are messier. A few things worth knowing as you go further:
Constitutional AI and RLAIF
Anthropic's models are aligned using Constitutional AI (CAI) — a variant of RLHF where, instead of only relying on human raters, the model is asked to critique and revise its own outputs against a set of principles (the "constitution"). A second model then scores the revised outputs, producing preference data at scale without requiring a human to evaluate every pair. This is sometimes called Reinforcement Learning from AI Feedback (RLAIF).
Chat templates and special tokens
Instruct models learn not just what a helpful response looks like, but also how conversations are structured. Each model family defines a chat template — special tokens that delimit turns. Mistral wraps instructions in [INST] and [/INST]. Llama 3 uses <|start_header_id|> and <|end_header_id|>. If you send raw text to an instruct model without the right template, you're essentially speaking the wrong dialect and will get degraded output. Libraries like Hugging Face transformers and tokenizers handle this automatically when you use apply_chat_template.
The alignment tax
There's a persistent debate in ML research about whether RLHF alignment reduces raw capability — sometimes called the alignment tax. Early evidence suggested aligned models scored slightly lower on certain benchmarks than their base counterparts. More recent work (and larger-scale alignment runs) suggests this gap has largely closed, and that alignment sometimes improves performance on reasoning benchmarks by encouraging the model to think step by step before answering.
Reasoning models: a fourth flavor
A newer category has emerged beyond instruct and chat: reasoning models (OpenAI's o-series, DeepSeek-R1, Anthropic's extended thinking variants). These are further tuned — typically with reinforcement learning on verifiable problems — to produce explicit chain-of-thought reasoning before generating a final answer. The "thinking" tokens are often hidden from the end user but consume significant compute. Reasoning models excel at math, coding, and multi-step logic problems where a standard instruct model makes careless errors.
FAQ
Can I access a base model directly through the OpenAI or Anthropic API?
No. OpenAI and Anthropic only expose aligned instruct/chat models through their public APIs. To experiment with a raw base model, you need an open-weight family like Meta Llama 4 or Mistral, which releases both pretrained and instruct variants on Hugging Face.
Is fine-tuning a base model better than fine-tuning an instruct model?
It depends on your dataset size and how different your desired behavior is from standard instruction-following. With small datasets (under a few hundred examples), start from the instruct model — instruction-following is already baked in and you just need to teach domain content. Base models need more training data to reach the same baseline helpfulness, but give you more flexibility to shape behavior from scratch.
What does 'instruction tuning' actually mean?
Instruction tuning is supervised fine-tuning on a dataset of (instruction, ideal response) pairs. The model is trained using the same next-token prediction objective used in pre-training, but applied to carefully curated examples of a human giving a task and an expert (or another model) providing the right response. This teaches the model what 'being helpful' looks like, not just what text looks like in general.
What is the difference between RLHF and DPO?
Both are alignment techniques that use human preference data — pairs of responses where a human says which one is better. RLHF trains a separate reward model on those preferences, then runs reinforcement learning to optimize the LLM against that reward model. DPO skips the reward model and optimizes the LLM directly on the preference pairs. DPO is simpler and cheaper (40–75% less compute) and reaches comparable alignment quality, which is why many newer models use it.
Why do some model names end in '-it' and others in '-Instruct'?
These are just different naming conventions by different labs for the same concept: a model that has been instruction-tuned after pretraining. Meta uses '-Instruct', Google's Gemma uses '-it' (instruction-tuned), Mistral uses '-Instruct', Qwen uses '-Instruct'. There is no technical distinction — always check the model card to understand exactly what post-training was applied.
Do I need a system prompt when calling an instruct model, or can I just send user messages?
You can send user messages without a system prompt and a modern instruct model will still behave helpfully — the instruction-following ability is in the weights, not the system prompt. However, a well-crafted system prompt lets you precisely control tone, format, persona, and constraints, and consistently improves output quality for production use cases. Think of the system prompt as free customization you should almost always use.