AI/TLDR

What Is a Foundation Model? The Base Models Behind Modern AI

Understand what makes a model a 'foundation model', why one base model can power thousands of apps, and how it differs from task-specific models.

BEGINNER11 MIN READUPDATED 2026-06-13

In plain English

A foundation model is one large AI model, trained once on a huge, broad pile of data, that thousands of different applications can build on top of. The word that matters is foundation: just like a building's foundation isn't the apartment, the shop, or the office above it — but every one of them rests on it — a foundation model isn't a chatbot, a coding tool, or a customer-support bot, yet all of those can be built from the same base model.

Foundation Models — illustration
Foundation Models — razorspire.com

Picture a brand-new university graduate. They've read an enormous amount across many subjects and picked up broad general knowledge, but they don't yet know your job. With a short briefing (a few examples, a clear instruction) they can do customer support today, draft contracts tomorrow, and summarize research the day after — same person, many jobs. A foundation model is that graduate: broadly capable out of the box, then adapted to a specific task by prompting it or training it a little further.

The contrast is a narrow, task-specific model: one trained from scratch to do exactly one thing, like detect spam or recognize a cat in a photo. That model is excellent at its one job and useless at everything else. A foundation model flips this — it's a generalist base that becomes a specialist on demand. Most of the AI you hear about today, including every large language model, is a foundation model.

Why it matters

Before foundation models, building an AI feature meant collecting a labeled dataset and training a fresh model for that one task. Want sentiment analysis? Gather thousands of labeled reviews and train a sentiment model. Want translation? Start over with a translation dataset. Every capability was a separate, expensive project from zero.

Foundation models broke that pattern with a simple but powerful economic shift: train once, adapt many times. One organization spends an enormous amount of money and compute to train a single broad model — then everyone else reuses it. You don't start from scratch; you start from a model that already understands language, code, and a great deal about the world, and you nudge it toward your task.

  • Cost moves to one place. The multi-million-dollar training run happens once. Building an app on top is cheap by comparison — often just writing a good prompt or a small fine-tune.
  • One base, endless products. The same model can power a writing assistant, a code helper, a search tool, and a tutoring app. Capability that was years of work per task is now a prompt away.
  • Skills you didn't train for show up for free. Because the model learned from such broad data, it can often do tasks nobody explicitly designed it for — a property called emergence. A model trained just to predict text turns out to translate, summarize, and reason.
  • Faster iteration. Changing what your app does can be as small as changing the instructions you send the model, instead of retraining anything.

Who should care? Almost anyone building software today. If you've ever called an API to summarize text, answer questions, or generate an image, you used a foundation model without training one yourself. That's the whole point: the hard, expensive part is done once by a few labs, and the rest of us build on top. This is also why a handful of base models end up underneath a huge fraction of the AI products you use — a concentration that's worth understanding, for both its leverage and its risks.

How it works

A foundation model's life has two distinct stages. Pre-training builds the broad, general model — this is the costly, once-per-model step. Adaptation shapes that general model into something useful for a specific job — this is what builders do, over and over, cheaply.

Pre-training: build the broad base

Pre-training feeds the model a massive, diverse dataset — for a language model, that's a large slice of the public internet, books, and code — and trains it on a simple self-supervised task: predict the next piece of text. No human has to label the data; the next word is the answer, so the model can learn from raw text at enormous scale. (For the mechanics of that prediction loop, see how LLMs work.) Doing this across trillions of words forces the model to absorb grammar, facts, reasoning patterns, and world knowledge as a side effect of getting good at prediction.

This stage is what makes the model "foundational" — and what makes it expensive. It needs huge clusters of specialized chips running for weeks or months, which is why these models need GPUs. The pattern that more data plus more compute plus a bigger model reliably yields a better model is described by scaling laws, and it's a big reason only well-funded labs train frontier foundation models.

Adaptation: shape the base to a task

Out of pre-training you get a base model: knowledgeable but raw, good at continuing text rather than following instructions. To make it useful, you adapt it. There are three common ways, from cheapest to most involved:

  1. Prompting — just describe the task in plain words (and maybe a few examples) in the input. No training at all; you steer the existing model. This is how most people use foundation models.
  2. Fine-tuning — train the base model a little further on your own examples so it specializes in your task or style. See what is fine-tuning.
  3. Alignment training — what model providers do to turn a raw base model into a helpful, safe assistant that follows instructions (this is the base vs instruct distinction).

The key idea: the same foundation model can branch into many specialized models and apps, each a small adaptation of one shared base. That branching is the source of all the leverage.

Foundation model vs LLM vs base model

These three terms get mixed up constantly because they overlap — but they answer different questions. "Foundation model" is about role, "LLM" is about what it works on, and "base model" is about training stage.

TermWhat it really meansQuick test
Foundation modelA model trained broadly enough to be adapted to many downstream tasks. Says nothing about modality or size.Do many different apps build on this one model?
LLM (large language model)A foundation model whose data and outputs are text. The most common kind of foundation model, but not the only kind.Does it work primarily with language/text?
Base modelA foundation model at the stage right after pre-training, before instruction/alignment tuning.Has it been turned into a helpful assistant yet, or is it still raw?

So the relationships are: every LLM is a foundation model, but not every foundation model is an LLM. Foundation models also include image models, audio models, and multimodal models that handle text, images, and sound together. The category is the bigger circle; LLMs are one important slice of it. (For where these sit relative to "generative AI" and "AGI," see LLM vs generative AI vs AGI.)

A concrete example

Imagine a software team that needs three features: a tool that turns support emails into a structured ticket, a chatbot that answers product questions, and a feature that translates user reviews into English. In the old world, that's three separate models trained on three separate datasets.

With a foundation model, all three are the same base model with different instructions wrapped around it. No new training is required — just different prompts to one shared API:

three features, one foundation modelpython
from anthropic import Anthropic

client = Anthropic(api_key="sk-ant-...")
MODEL = "claude-sonnet-4-6"  # one foundation model, reused for all tasks

def run(instruction, text):
    msg = client.messages.create(
        model=MODEL,
        max_tokens=400,
        messages=[{"role": "user",
                   "content": f"{instruction}\n\n{text}"}],
    )
    return msg.content[0].text

# Feature 1 — structure a support email into a ticket
run("Extract the customer's issue, urgency, and product as JSON.", email)

# Feature 2 — answer a product question
run("Answer this question about our product, politely and briefly.", question)

# Feature 3 — translate a review to English
run("Translate the following review into natural English.", review)

Three very different capabilities, zero model training, one base model. Swap instruction and you swap the feature. That is the train-once-adapt-many economics in a single screen of code — and it's why a small team can ship AI features that would once have needed a research lab per task.

Going deeper

Once the basic picture clicks, a few deeper themes are worth knowing — they explain a lot of the current debate around foundation models.

Multimodality. Early foundation models handled one type of data. Modern ones increasingly handle several — reading an image, listening to audio, and writing text in the same model. A multimodal foundation model learns a shared representation across modalities, which is why one model can describe a photo, transcribe speech, and answer questions about a chart. The "foundation" idea is the same; the inputs just got richer.

Emergent abilities and scaling. A striking finding is that some capabilities don't appear gradually — they're nearly absent in small models and then show up once a model crosses a size threshold. This is tied to scaling laws, and it's part of why labs keep building bigger models: you can't always predict from a small one what a larger one will be able to do. It also makes evaluation hard, because a model may quietly be capable of something nobody tested for.

Concentration and risk. Because pre-training is so expensive, a small number of organizations train the foundation models that the rest of the ecosystem depends on. That gives enormous leverage but also concentrates risk: a flaw, bias, or vulnerability in one base model propagates to every app built on it. This homogenization — many systems inheriting the same strengths and the same weaknesses — was one of the central concerns the original 2021 report raised, and it's a live policy topic today.

Open vs closed. Some foundation models are released with open weights you can download and run yourself; others are available only through an API. Open models give you control, privacy, and the ability to fine-tune deeply; closed models often lead on raw capability and are simpler to use. Which to pick depends on your needs for cost, control, and capability — not on one being universally "better."

The honest summary: a foundation model is less a finished product than a substrate. Its value comes from everything built on top of it, which is exactly why understanding the base — its training, its limits, and its biases — matters even when all you do is call an API. The next steps from here are seeing how an LLM actually predicts text and how an everyday assistant like ChatGPT is assembled from one of these bases.

FAQ

What is a foundation model in simple terms?

A foundation model is one large AI model trained on broad, general data that many different applications build on top of. Instead of training a separate model for every task, you adapt one shared base model — by prompting or fine-tuning it — to do customer support, coding, translation, and much more. It's a generalist base that becomes a specialist on demand.

What is the difference between a foundation model and an LLM?

An LLM (large language model) is a foundation model that works with text. "Foundation model" is the broader category — it also includes image, audio, and multimodal models. So every LLM is a foundation model, but not every foundation model is an LLM. The terms overlap because most famous foundation models today happen to be LLMs.

What is the difference between a base model and a foundation model?

They're closely related. "Foundation model" describes the role — a broad base many apps build on. "Base model" usually refers to that model at the stage right after pre-training, before it's been instruction-tuned and aligned into a helpful assistant. A base model is a foundation model in its raw, pre-alignment form.

Why are they called foundation models?

Because they act like a foundation in construction: incomplete on their own, but the base that countless other systems are built on. Stanford researchers coined the term in 2021 to capture this role — the name highlights that the model is foundational to an ecosystem of downstream apps, not that it's any particular size or architecture.

Are foundation models the same as generative AI?

Not exactly. Generative AI refers to any system that creates new content (text, images, audio). Most modern generative AI is powered by foundation models, but "foundation model" describes the broadly-trained, reusable base, while "generative AI" describes what the system does. They often point at the same product from different angles.

Do I need to train a foundation model to use one?

No. Training a foundation model from scratch costs millions of dollars and requires huge GPU clusters, which is why only a few labs do it. As a builder you reuse an existing foundation model through an API or open weights, and adapt it by prompting or light fine-tuning. That reuse is the entire economic advantage.

Further reading