In plain English
An LLM doesn't read words. It reads tokens — chunks of text the model was trained to recognize. A token is often a whole common word (the, dog), but it can also be part of a word (token + ization), a single character, a space, or a punctuation mark. If you're fuzzy on what a token even is, start with What Is a Token in an LLM?.
So when you want to know "will my 4,000-word document fit?" or "how much will this prompt cost?", you have to translate between three units: characters (what you typed), words (how you think), and tokens (what the model and the bill actually count).
Here's the everyday analogy. Think of packing a suitcase. Characters are individual socks. Words are outfits. Tokens are the little packing cubes the airline weighs. You plan in outfits, but you get charged by the cube — and an outfit doesn't always fit in one cube. Estimating tokens is just learning the rough number of cubes per outfit, plus knowing the cases where one outfit needs three cubes.
Why it matters
Tokens are the unit everything is measured in. You don't pay per word and you don't get a word limit — you pay per token and you get a token limit. Three concrete reasons to be able to estimate them on the back of an envelope:
- Cost. API providers bill per million input and output tokens. If you can eyeball that a 10-page report is ~6,000 tokens, you can predict the bill before you press send. See LLM API Pricing.
- Fit. Every model has a context window measured in tokens — the max it can read at once. "Does my 300-page PDF fit in 200K tokens?" is a token-estimation question, and getting it wrong means content gets silently dropped.
- Speed. Output is generated one token at a time, so longer answers literally take longer. A 500-token reply streams roughly twice as long as a 250-token one.
The catch: the conversion rate is a rough average, not a law of physics. Code, numbers, non-English text, and weird formatting all break the 0.75-words rule — sometimes badly. Knowing both the rule and its exceptions is the whole skill.
How it works
Modern LLMs tokenize text with subword algorithms — usually a variant of Byte-Pair Encoding (BPE), covered in How Does Tokenization Work?. The idea: build a fixed vocabulary (tens of thousands of pieces) where common words get their own token and rarer words get split into familiar fragments. The pipeline from your text to a bill looks like this:
Why ~4 characters per token for English? Because the average English word is about 5 characters plus a space, and common words map to a single token — but enough longer or rarer words get split into 2+ pieces that the average settles near 4 characters. That's where the 0.75-words rule comes from. To see the three units side by side:
- 35 characters
- including spaces + period
- 4 words
- how a human counts
- ~7 tokens
- `Token`+`ization`, `un`+`believ`+`ably`…
- rare/long words split
Notice what happened: is is one token, but unbelievably got chopped into pieces. Common short words are cheap; long or unusual words cost more tokens than you'd expect. The rule of thumb works because these average out across a paragraph — it gets shaky on a single short string.
The conversion cheat sheet
Here are the numbers worth keeping in your head. They assume ordinary English prose and a modern tokenizer. Treat them as ±10–20%, not exact.
| You have | Multiply by | To estimate |
|---|---|---|
| Words | × 1.33 | Tokens |
| Tokens | × 0.75 | Words |
| Characters (English) | ÷ 4 | Tokens |
| Tokens | × 4 | Characters |
| A4 / Letter page (~500 words) | × 1.33 | ~650–700 tokens |
Some anchors that come up constantly when you're sizing prompts and documents:
- 1,000 tokens ≈ 750 words ≈ 1.5 pages of double-spaced text.
- A typical chat message (a sentence or two) is 20–60 tokens.
- A short blog post (~800 words) is ~1,100 tokens.
- A novel (~100,000 words) is ~130,000–150,000 tokens — which is why "does a whole book fit in a 200K context window?" is usually a yes now.
- This article you're reading (~1,800 words) is roughly 2,400 tokens.
When the 0.75-words rule lies
The 0.75-words rule is tuned for English prose. The moment your text stops looking like an English novel, the ratio shifts — sometimes a lot. Know these four landmines:
1. Code and numbers cost more
Source code is full of symbols, indentation, camelCase, snake_case, and long digit strings — all of which fragment into many tokens. A line of Python often runs 3–5 characters per token instead of 4, and a UUID or a long number can be one token per few characters. Budget code at roughly 2.5–3.5 characters per token to be safe.
2. Other languages cost much more
Tokenizers are trained mostly on English, so non-English text is split into smaller pieces. As of mid-2026, the rough picture verified across providers: most European languages run ~1.5–2× the tokens of equivalent English, and CJK languages (Chinese, Japanese, Korean) run ~2–3× — Chinese often lands near 1 token per 2 characters, Japanese near 1 per 3. Some low-resource languages are far worse. Same meaning, more tokens, higher bill.
3. Whitespace and markup inflate counts
JSON, HTML, Markdown tables, and deeply-indented text spend tokens on braces, tags, and runs of spaces. A pretty-printed JSON blob can have a third of its tokens going to structure rather than data.
4. The tokenizer itself changes between models
Different model families use different tokenizers, so the same text counts differently across providers — and even across versions of one provider. As of mid-2026, Anthropic's docs note that the tokenizer introduced with Claude Opus 4.7 produces roughly 30% more tokens for the same text than earlier Claude models. So a token count you measured last year may be wrong for this year's model.
- ~4 chars / token
- the friendly baseline
- ~2.5–3.5 chars / token
- symbols + indentation split
- ~2–3× the tokens
- of equivalent English
Count exactly, don't guess
When the estimate isn't good enough, run the real tokenizer. For OpenAI-family models, the open-source tiktoken library does it locally and for free — no API call. Its modern encoding (o200k_base) matches recent GPT models; the older cl100k_base matches GPT-3.5/4-class models.
# pip install tiktoken
import tiktoken
# o200k_base = encoding used by recent OpenAI models
enc = tiktoken.get_encoding("o200k_base")
samples = {
"english": "Tokenization is unbelievably handy.",
"code": "for i in range(len(items)): total += items[i].price",
"chinese": "标记化非常方便。",
}
for name, text in samples.items():
ids = enc.encode(text)
chars = len(text)
# chars-per-token reveals where the 4.0 rule holds vs breaks
print(f"{name:8} {len(ids):3} tokens {chars:3} chars "
f"{chars/len(ids):.1f} chars/token")
# english ~7 tokens 35 chars ~5.0 chars/token
# code ~17 tokens 53 chars ~3.1 chars/token <- denser
# chinese ~9 tokens 8 chars ~0.9 chars/token <- much denserFor Claude, the tokenizer isn't public, so you count via the API. Anthropic exposes a free count_tokens endpoint that returns the exact input-token count for a request — including system prompt, tools, images, and PDFs — before you actually send it. It's billed-free and uses the same tokenizer as the model you name.
# pip install anthropic
import anthropic
client = anthropic.Anthropic()
result = client.messages.count_tokens(
model="claude-opus-4-8",
system="You are a helpful assistant.",
messages=[{"role": "user", "content": "How many tokens is this?"}],
)
print(result.input_tokens) # exact count, free, no message createdGoing deeper
Once the basic conversions are second nature, a few subtler points separate a rough guess from a reliable one.
Tokenizer vocabulary size shifts the ratio
A bigger vocabulary can represent more words as a single token, so it tends to use fewer tokens for the same text. OpenAI's o200k_base has a ~200K-piece vocabulary versus ~100K for the older cl100k_base, and on typical English it produces slightly fewer tokens. But bigger isn't free — every vocabulary entry adds to the model's embedding table, which ties into why LLMs need GPUs. Vocabulary design is a real trade-off, not a pure win.
Token counts aren't additive across turns
It's tempting to count each message once and sum them. But in a chat, the entire conversation history is re-sent on every turn, plus a fixed overhead for role markers and chat-template tokens like <|im_start|> (model-specific). So a 10-turn conversation costs far more than 10 single messages — the input grows roughly quadratically with turn count unless you trim history. This is the practical reason long chats get expensive and eventually hit the wall described in What Happens When You Exceed the Context Window?.
Why character-splitting causes the "strawberry" bug
Because the model sees tokens, not letters, it can't reliably count characters inside a token — it never saw strawberry as s-t-r-… in the first place. That's the root of the famous letter-counting failures, unpacked in the strawberry problem. The same blindness explains why models fumble reversing strings or doing digit-by-digit arithmetic on long numbers.
Output tokens are the expensive, slow ones
Input is processed in parallel, but output is generated one token at a time, each depending on the last. That's why output tokens are usually priced higher than input tokens and why long answers feel slow. If latency or cost matters, the highest-leverage move is often capping max_tokens and asking for concise output — not shrinking the prompt. This connects to how the model actually picks each next token, covered in How Do LLMs Actually Work?.
FAQ
How many tokens is one word?
For ordinary English, about 1.33 tokens per word on average — equivalently, 1 token ≈ 0.75 words. Short common words are usually one token; long or rare words get split into several, so the per-word number is an average, not a fixed rate.
How many words is 1000 tokens?
Roughly 750 words of English prose, or about 1.5 pages of double-spaced text. For code or non-English text it will be fewer words, because those use more tokens per word.
How many characters are in a token?
About 4 characters per token for English, including spaces. Code and JSON run denser (~2.5–3.5 chars/token), and languages like Chinese or Japanese can be near 1 character per token, meaning many more tokens for the same content.
Why does the same text cost more tokens on some models than others?
Each model family uses its own tokenizer with a different vocabulary, so identical text splits differently. As of mid-2026, Anthropic notes the tokenizer introduced with Claude Opus 4.7 produces roughly 30% more tokens than earlier Claude models — so always re-count when you switch models.
How do I count tokens exactly instead of estimating?
For OpenAI-family models use the free tiktoken library locally with the o200k_base encoding. For Claude, call the free count_tokens API endpoint, which returns exact input tokens for your full request. Gemini and others offer an equivalent countTokens call.
Do I pay for output tokens too, or just my prompt?
Both. You're billed for input tokens (your prompt, system message, and any retrieved context) and output tokens (the model's reply). Output tokens are usually priced higher and are generated one at a time, so they're the slow, expensive part.