In plain English
Before a RAG system can search your documents, it has to cut them into smaller pieces. A 300-page manual is too big to embed or retrieve as one block, so you split it into short passages — this is chunking. Every chunk becomes one searchable unit, and the quality of those cuts quietly decides how good your retrieval can ever be.

Chonkie is a small, fast open-source Python library that does exactly this one job well: it takes raw text and splits it into clean chunks, using whichever strategy fits your data. Instead of writing fiddly splitting code by hand or pulling in a giant framework just to cut text, you import Chonkie, pick a chunker, and feed it your document.
Think of it like a good kitchen knife and cutting board for text. A whole roast won't fit on a fork, and hacking at it randomly leaves you with ragged, useless pieces. A sharp, purpose-built tool gives you even, sensible portions every time. Chonkie is that tool — it's not the model, the database, or the meal; it's the thing that prepares the ingredients so everything downstream works.
Why it matters
Chunking sits at the very front of a RAG pipeline, and everything after it inherits its mistakes. If a chunk splits a sentence in half, both halves lose meaning. If a chunk crams three unrelated topics together, its embedding becomes a blurry average that matches nothing well. Get chunking wrong and no retriever, reranker, or model can fully recover — the right answer was never packaged into a findable unit.
So why a dedicated library instead of a few lines of text.split()? Because good chunking is harder than it looks, and the naive versions fail in predictable ways.
- Splitting by characters breaks words and tokens. Cutting at a fixed character count slices through the middle of words and multi-token sequences, producing chunks that embed poorly and waste model budget.
- Splitting by paragraphs gives wildly uneven sizes. One paragraph is a sentence; the next is a page. Embeddings work best on similarly sized, self-contained passages.
- Different content needs different strategies. Prose, transcripts, and dense reference text don't all chunk the same way, and rewriting a custom splitter for each is tedious and error-prone.
A focused library solves this once, correctly. Chonkie gives you several proven chunking strategies behind one consistent interface, handles the token-counting and boundary logic for you, and stays small enough that adding it to a project doesn't drag in a heavy dependency tree. The payoff is concrete: cleaner chunks raise retrieval precision, which is the single biggest lever on RAG answer quality.
How it works
Chonkie's whole model is simple: you create a chunker configured for a target chunk size (usually measured in tokens, not characters), then call it on your text. It walks the document, finds sensible boundaries, and returns a list of chunk objects — each holding the chunk text plus metadata like token count and start/end position. You then embed those chunks and store them in a vector database, exactly as any RAG ingestion step would.
The chunker family
The core idea is that one chunker type does not fit all text, so Chonkie ships a family of them. They form a ladder from cheap-and-mechanical to smart-and-expensive:
- Token chunker — splits purely by token count into fixed-size windows, usually with some overlap between neighbours. Fast and predictable; it ignores meaning, so a boundary can land mid-thought.
- Sentence chunker — respects sentence boundaries, packing whole sentences into a chunk until it nears the size limit. Never cuts a sentence in half, which already fixes the most common naive-splitting bug.
- Recursive chunker — splits along a hierarchy of separators (sections, then paragraphs, then sentences), backing off to finer cuts only when a piece is still too big. A strong, general-purpose default.
- Semantic chunker — embeds candidate units and groups consecutive ones that are similar in meaning, starting a new chunk where the topic shifts. Boundaries follow ideas, not punctuation (see semantic chunking).
- LLM-based chunker — asks a language model where to cut, letting it reason about structure and topic. The most context-aware option, and the slowest and most costly.
Two ideas appear in almost every chunker. Chunk size is the target length, usually in tokens, that you want each piece to be. Overlap repeats a little text from the end of one chunk at the start of the next, so a fact that straddles a boundary still lands wholly inside at least one chunk. Both are knobs you tune — see chunk size and overlap.
Because every chunker shares the same interface, swapping one for another is usually a one-line change. That makes Chonkie a place to experiment: start with a fast chunker, measure retrieval quality, and only reach for a heavier strategy if the numbers say you need it.
A minimal example in code
The whole API is small enough to show in a few lines. You pick a chunker, set a size, and call it. Here is the common starting point — a recursive chunker with a token-based size and a little overlap:
from chonkie import RecursiveChunker
# Create a chunker: aim for ~512-token chunks with light overlap.
chunker = RecursiveChunker(chunk_size=512)
text = open("handbook.md").read()
# Split the document into chunk objects.
chunks = chunker(text)
for c in chunks:
print(c.token_count, "tokens:", c.text[:60], "...")
# c.text -> the chunk's text (what you embed)
# c.token_count -> length, handy for budgeting
# c.start_index / c.end_index -> where it came fromTrying a different strategy is just a different import and class — the call site stays the same, which is exactly what makes A/B testing chunkers easy:
from chonkie import SemanticChunker
# Same shape of call; smarter boundaries based on meaning.
chunker = SemanticChunker(chunk_size=512)
chunks = chunker(long_article)
# Hand the chunk texts to your embedding model + vector store next.
texts = [c.text for c in chunks]Choosing the right chunker
There is no universally best chunker — only the right one for your content, budget, and quality bar. The honest rule of thumb: start simple, measure, and climb the ladder only when retrieval evaluation shows a real gain. Smarter chunkers cost more time and money on every ingest, and sometimes a sentence chunker matches a semantic one on quality while being far cheaper.
| Chunker | How it decides cuts | Speed / cost | Good fit for |
|---|---|---|---|
| Token | Fixed token windows | Fastest, free | Huge corpora, quick baselines |
| Sentence | Sentence boundaries | Fast, free | Clean prose, transcripts |
| Recursive | Structure hierarchy | Fast, free | A strong general default |
| Semantic | Embedding similarity | Slower, embed cost | Topic-shifting mixed docs |
| LLM-based | A model reasons about it | Slowest, token cost | High-value, tricky content |
A practical workflow: chunk with the recursive chunker first, since it respects document structure without any model calls. Build a small RAG evaluation set and measure retrieval precision and recall. If specific failures look like topic-bleed across chunk boundaries, try the semantic chunker on just that content and compare. Let the evaluation numbers, not vibes, decide whether the extra cost is worth it.
Going deeper
Chonkie covers chunking well, but real ingestion has a few extra concerns worth knowing once the basics click.
Chunking is only one stage. A library that splits text does not parse it. You still need to extract clean text from PDFs, HTML, or slides before chunking — garbage in, garbage chunks out (see parsing PDFs for RAG and cleaning data before RAG). Code, tables, and Markdown also have their own chunking gotchas, because splitting them like prose destroys their structure — see chunking code, tables, and Markdown.
Beyond plain chunkers. Modern chunking libraries are growing past pure splitting. Some add refinery steps that post-process chunks — for example, attaching surrounding context or metadata to each piece so a retrieved chunk carries enough background to stand alone. That idea connects directly to contextual retrieval, where each chunk is prepended with a short description of where it sits in the document, measurably improving retrieval.
Token counting matters. Chunk size in tokens depends on a tokenizer, and different models tokenize differently. A chunk sized for one model's tokenizer may be slightly off for another. For most RAG work the difference is minor, but if you are tightly budgeting a context window, make sure your chunker counts tokens with a tokenizer close to the model you will actually feed.
Where to go next. Chunking choices only prove their worth against measurement, so pair Chonkie with a real evaluation loop — compare strategies on retrieval precision and recall rather than guessing. And remember the durable lesson of RAG: the retriever can only return chunks you created well, so time spent on chunking pays off across every query that follows. To go broader on the strategy space itself, read chunking strategies compared.
FAQ
What is Chonkie used for?
Chonkie is a lightweight Python library for splitting long text into chunks for RAG ingestion. You use it at the start of a pipeline to turn documents into clean, similarly sized passages that you then embed and store in a vector database. It offers several strategies — token, sentence, recursive, semantic, and LLM-based chunkers — behind one consistent interface.
How is Chonkie different from chunking in LangChain or LlamaIndex?
Chonkie is a focused, standalone library that does only chunking, with a deliberately small dependency footprint and a fast, simple API. Bigger frameworks bundle chunking inside a larger toolkit alongside retrieval, agents, and prompting. If you only need good chunking — or want it independent of a framework — a dedicated library keeps your project lean.
What chunking strategies does Chonkie support?
The main chunkers are token (fixed token windows), sentence (respects sentence boundaries), recursive (splits along a hierarchy of separators), semantic (groups text by meaning using embeddings), and LLM-based (a model decides where to cut). They trade off speed and cost against how context-aware the boundaries are.
Which Chonkie chunker should I start with?
Start with the recursive chunker. It respects document structure, needs no model calls, and works well as a general default. Only move to the semantic or LLM-based chunkers if a retrieval evaluation shows that smarter, topic-aware boundaries actually improve your results enough to justify the extra cost.
Does chunking really affect RAG quality that much?
Yes. Chunking is the first stage of the pipeline, so every later step inherits its mistakes. Chunks that split sentences or mix unrelated topics produce poor embeddings, and the right passage may never be packaged into a findable unit. Cleaner chunks raise retrieval precision, which is one of the biggest levers on final answer quality.
Is Chonkie free and open source?
Yes, Chonkie is an open-source library you can install and use freely. The base install is intentionally minimal, and heavier features — like the embedding models a semantic chunker needs — are optional extras you add only when a given chunker requires them.