In plain English
A large language model (LLM) is a computer program trained to do one deceptively simple thing: look at a piece of text and predict what comes next. That's it. Every chatbot answer, every AI-written email, every block of generated code is that one trick — guess the next chunk of text — repeated over and over, very fast.
The everyday analogy is the autocomplete on your phone keyboard. Type "see you" and it suggests "later." An LLM is that idea scaled to an absurd degree: instead of learning from your last few texts, it has digested trillions of words — books, websites, code, forum posts, documentation. And instead of suggesting one word from a short list, it weighs every possible next chunk of text against everything you've written so far, including instructions, questions, and code.
Here's the part that surprises people: nobody programmed an LLM with grammar rules, facts, or logic. It learned all of that as a side effect of getting really good at next-word prediction. To reliably finish the sentence "The capital of France is ___", the model has to encode something that behaves like knowing geography. To finish a half-written Python function, it has to encode something that behaves like knowing Python. Scale that across trillions of examples and you get a system that acts knowledgeable about almost everything humans have written down.
Why it matters
Before LLMs, language software was built one narrow task at a time. A spam filter was one hand-built system. A translator was another. A sentiment classifier, a grammar checker, a search ranker — each needed its own team, its own training data, and months of work. None of them could do anything outside its lane.
LLMs flipped that. One general-purpose model can summarize a contract, translate it to German, draft a reply, and write the Python script that does all three — and you ask for each task in plain English, not code. The programming interface for an LLM is just... sentences. That's why the field exploded after ChatGPT's launch in late 2022: suddenly anyone who could type could 'program' a computer.
Who should actually care:
- Developers — LLMs are now a standard building block, called through an API like a database or a payment processor. Whole product categories (coding assistants, AI agents, RAG search) are LLM wrappers with engineering around them.
- Anyone who writes for a living — drafting, editing, summarizing, and translating are exactly the next-word-prediction sweet spot.
- Decision makers — knowing what an LLM is (a text predictor) and is not (a fact database) is the difference between deploying one safely and shipping confident nonsense to customers.
The thing LLMs replaced isn't just older software — it's the assumption that computers need precise, formal instructions. That assumption held for 70 years. It doesn't anymore.
How it works
First, the model never sees letters or words. Text gets chopped into tokens — chunks that are usually a short word or a piece of a longer one ("unbelievable" might become un + believ + able). Tokens are the model's atoms; everything it reads and writes is measured in them. We cover the details in what is a token.
Generation is a loop. The model reads all the tokens so far, computes a probability score for every token in its vocabulary (typically 50,000–250,000 options), picks one, sticks it on the end, and runs again. One token per pass. A 300-word answer is roughly 400 trips around this loop:
Where the 'knowledge' comes from
Those probability scores come from training. During training, the model reads enormous amounts of text and plays fill-in-the-blank billions of times: predict the next token, check the real answer, nudge the internal parameters to be slightly less wrong, repeat. After enough rounds — months on thousands of GPUs — the parameters encode the statistical structure of human language and a huge amount of factual association along with it.
A freshly trained model is just a raw text-completer, though — ask it a question and it might respond with more questions, because that's what often follows questions on the internet. A second, shorter phase teaches it to behave like an assistant: follow instructions, answer rather than continue, refuse harmful requests. That difference is the base vs instruct model split, and it's why ChatGPT feels like a helpful colleague instead of a deranged autocomplete.
Two limits to internalize on day one. First, the model can only "see" a fixed amount of text at once — its context window. Anything outside it may as well not exist. Second, its knowledge is frozen at training time; it learns nothing new from talking to you. For the deeper mechanics of prediction, see how LLMs actually work.
LLMs you've already met
Chances are you've used several LLMs without thinking about it. The chatbot is just the front door — the same model families power coding assistants, search summaries, and customer-support bots. Open weights means you can download the model file and run it on your own hardware; closed models are reachable only through the maker's API.
| Model family | Made by | Open weights? | Best known for |
|---|---|---|---|
| Claude | Anthropic | No | Coding, agents, long-document work |
| GPT | OpenAI | Mostly no | Powers ChatGPT, the mainstream default |
| Gemini | No | Deep integration with Search and Workspace | |
| Llama | Meta | Yes | The standard base for local and fine-tuned models |
| Mistral | Mistral AI | Many yes | Small, efficient European models |
| DeepSeek / Qwen | DeepSeek, Alibaba | Yes | Strong open models, aggressive efficiency |
Talk to an LLM in code
You don't need an API key or a paid account to poke at a real LLM. The snippet below downloads GPT-2 — a small, ancient (2019) open-weights model — and runs the generation loop on your own machine. It's a toy by today's standards, which is exactly why it's instructive: you can watch raw next-token prediction without the polish.
# pip install transformers torch
from transformers import pipeline
# Downloads ~500 MB the first time, then runs fully offline
generator = pipeline("text-generation", model="gpt2")
result = generator(
"The capital of France is",
max_new_tokens=20,
)
print(result[0]["generated_text"])Run it a few times. You'll usually get "Paris" — and then the model will keep rambling, because GPT-2 is a base model with no assistant training: it doesn't answer, it continues. You'll also get different output each run, because the loop picks tokens with a controlled dose of randomness. Production LLMs are the same machine with three upgrades: thousands of times more parameters, far more training data, and an instruction-following phase bolted on top.
What an LLM is not
Most LLM disasters trace back to treating the model as something it isn't. Three corrections worth memorizing:
- Not a database. An LLM doesn't store documents and look them up. Facts are smeared across billions of parameters as statistical tendencies. That's why it can't tell you where it learned something, and why it sometimes states falsehoods with total confidence — see why LLMs hallucinate.
- Not connected to the internet (by itself). The raw model knows nothing after its training data was collected. When a chatbot cites today's news, that's a separate search tool feeding results into the model's context, not the model knowing things.
- Not reasoning the way you do. It produces text that is statistically shaped like reasoning, and that's often good enough to be genuinely useful — but it can fail in ways no human would, like fumbling arithmetic mid-sentence while writing a flawless essay around it.
Going deeper
Everything above describes the behavior. The machinery underneath is the transformer, an architecture introduced in the 2017 paper Attention Is All You Need. Its core operation, attention, lets every token look at every other token in the input and decide which ones matter for predicting what comes next — which is how the model connects a pronoun on line 40 to a name on line 2. Nearly every modern LLM is a decoder-only transformer stack: dozens of identical layers, each refining the representation of the text before a final layer turns it into next-token probabilities.
A few threads that matter once the basics click. Scale: capability tracked surprisingly smooth curves as labs grew parameters, data, and compute together — the scaling-laws result that justified billion-dollar training runs, though the frontier has since shifted toward squeezing more from post-training and letting models 'think longer' at answer time. Efficiency: many frontier models are now mixture-of-experts designs, where only a fraction of the parameters activate per token, so a huge model runs at a mid-sized model's cost. Inference economics: serving an LLM means re-reading the whole conversation for every new token; tricks like KV caching keep that from being quadratic pain, but long chats genuinely do cost more — this is why context length is a pricing axis, not just a feature.
The open problems are honest ones. Hallucination isn't a bug to be patched — confident next-token prediction over incomplete knowledge is the mechanism, so mitigation means grounding (retrieval, tools, citations) rather than elimination. Interpretability — actually reading what the billions of parameters encode — is an active research field, not a solved one. And the boundary between 'statistical pattern-matching' and 'understanding' remains a live philosophical fight, which is healthy: you can build very real systems on top of LLMs without settling it. When you're ready, the natural next step is the prediction mechanics in the next article, then outward to prompting, RAG, and agents — the engineering layers that turn a text predictor into software.
FAQ
What does LLM stand for in AI?
LLM stands for Large Language Model — 'large' for the billions of trainable parameters, 'language' because it's trained on text, and 'model' because it's a mathematical function that maps input text to predicted output text. It is not a database or a search engine.
Is ChatGPT the same thing as an LLM?
Not exactly. ChatGPT is an app; the LLM (a GPT-series model) is the engine inside it. The app layers extra features — conversation memory, web search, tools — on top of the raw model. Claude, Gemini, and other assistants have the same app-versus-model split.
Do large language models actually understand what they say?
Functionally, they behave as if they understand a lot — they track context, resolve ambiguity, and follow complex instructions. Mechanically, they are predicting the next token from learned statistical patterns. Whether that counts as 'real' understanding is a genuinely open debate; in practice, judge them by reliability on your task, not by the philosophy.
Can I run a large language model on my own laptop?
Yes. Open-weights models like Llama, Mistral, and Qwen come in small sizes that run on a recent laptop using tools like Ollama or llama.cpp. They're weaker than frontier API models, but completely private and free to run. Tiny older models like GPT-2 will run on almost anything.
Why do LLMs make things up if they were trained on real text?
Because they store statistical tendencies, not retrievable documents. When the model is asked about something thinly covered in training data, the next-token machinery still produces a fluent, confident-sounding answer — there's no built-in 'I don't know' lookup. This is called hallucination, and it's mitigated with retrieval and tools, not eliminated.