In plain English
Chroma (also written ChromaDB) is an open-source vector store that runs directly inside your Python process. You pip install chromadb, write three lines of code, and you have a fully working similarity-search engine on your laptop — no server to start, no cloud account to create, no infrastructure to manage.

The best analogy is a sticky-note board for meaning. Every piece of text you add gets converted to a cluster of numbers — its embedding — that captures what it's about. Chroma pins those clusters onto an invisible coordinate board. When you ask a question, Chroma converts it to another cluster and finds the pins closest to it. That's semantic search: finding ideas near yours, not just words that match.
Because it runs in-process, Chroma is the fastest way to try that idea out. You don't ship anything or pay anything. When your prototype graduates to production, you can swap Chroma out for a managed service — but for the first 90% of development, it's unbeatable for speed.
Why it matters for builders
Before Chroma and similar tools existed, adding semantic search to a project meant spinning up Elasticsearch, Weaviate, or a cloud vector service — all of which require configuration, credentials, and operational overhead before you can write your first query. That setup cost killed a lot of experiments before they started.
Chroma removed that barrier. The killer use cases it unlocks for prototypers are:
- RAG (Retrieval-Augmented Generation) — store your documents as embeddings and retrieve the most relevant chunks before calling an LLM, so the model answers from your data instead of hallucinating.
- Semantic search — let users query a knowledge base, product catalog, or support corpus in natural language.
- Duplicate / near-duplicate detection — find items that mean the same thing even when phrased differently.
- Long-term agent memory — give an AI agent a persistent store of facts it can look up during a conversation.
- Recommendation — surface items that are conceptually similar to what a user is currently viewing.
The common thread is: you need to search by meaning, not by keyword, and you need it working today. Chroma is purpose-built for that scenario.
How Chroma works under the hood
Chroma's architecture has four layers that work together every time you add a document or run a query.
Collections
The top-level concept in Chroma is a collection — think of it as a table in a relational database, but one that only holds vectors. Each item in a collection has four parts: a unique id, the raw document text, a metadata dictionary of arbitrary key-value pairs, and optionally the pre-computed embedding (Chroma generates it for you if you omit it).
The HNSW index
Chroma uses HNSW (Hierarchical Navigable Small World), an approximate nearest-neighbour algorithm, to answer similarity queries in sub-linear time. HNSW builds a layered graph where each node connects to its closest neighbours; at query time it navigates the graph to find approximate top-k matches without checking every vector. The tradeoff is that results are approximate — extremely close but not guaranteed optimal — in exchange for speed that stays fast even as the collection grows to millions of items.
The default distance metric is L2 (squared Euclidean distance). You can choose cosine or ip (inner product) when you create a collection, but you cannot change the metric after creation — plan ahead.
Three client modes
Chroma ships three client classes that cover the full spectrum from throwaway script to multi-process server:
| Client class | Where data lives | Survives restart? | Best for |
|---|---|---|---|
EphemeralClient() | RAM only | No | Unit tests, quick experiments |
PersistentClient(path=...) | Disk (SQLite + HNSW files) | Yes | Local dev, single-user apps |
HttpClient(host=...) | Separate Chroma server process | Yes (server-side) | Multi-client or containerised setups |
Quick start: from zero to semantic search
Here is the minimal path from a fresh environment to a working similarity query. The whole thing takes about two minutes.
pip install chromadbThe install pulls in the onnxruntime package and the all-MiniLM-L6-v2 model weights so Chroma can generate embeddings locally without any API key.
import chromadb
# In-memory client — data disappears when the script exits
client = chromadb.EphemeralClient()
# Create a collection (like a table)
collection = client.create_collection(name="docs")
# Add documents — Chroma embeds them automatically
collection.add(
documents=[
"Transformers use self-attention to model long-range dependencies.",
"A convolutional network slides filters over a 2D grid.",
"Reinforcement learning agents learn by maximising a reward signal.",
"Diffusion models generate images by gradually denoising random noise.",
],
ids=["t1", "t2", "t3", "t4"],
)
# Query by natural language
results = collection.query(
query_texts=["how do language models pay attention?"],
n_results=2,
)
print(results["documents"])
# [['Transformers use self-attention to model long-range dependencies.',
# 'Diffusion models generate images by gradually denoising random noise.']]Adding metadata and filtering
Real applications rarely want pure semantic search. You almost always need to combine meaning-based ranking with structured constraints — "find me the most relevant article, but only from the last 30 days" or "only search docs tagged as public." Chroma handles this with the where clause, which filters on the metadata dictionary you attach to each document.
collection.add(
documents=[
"GPT-4 was released by OpenAI in March 2023.",
"Gemini 1.5 Pro was released by Google in February 2024.",
"Llama 3 was released by Meta in April 2024.",
],
ids=["gpt4", "gemini", "llama3"],
metadatas=[
{"vendor": "openai", "year": 2023},
{"vendor": "google", "year": 2024},
{"vendor": "meta", "year": 2024},
],
)
# Semantic query restricted to 2024 releases only
results = collection.query(
query_texts=["multimodal large language model"],
n_results=2,
where={"year": {"$gte": 2024}}, # metadata filter
)
Supported operators include $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $contains, and logical combinators $and / $or. These compose freely, so you can express complex filters without a separate SQL query.
Chroma vs the alternatives
Chroma is one of several vector store options, each optimised for a different point on the scale-vs-simplicity curve. Here is how it fits relative to the tools most developers encounter first.
| Chroma | Pinecone | pgvector | FAISS | |
|---|---|---|---|---|
| Setup | pip install, zero config | Cloud account, API key | Postgres extension | C++ library, Python wrapper |
| Runs in-process | Yes | No (cloud API) | No (Postgres server) | Yes |
| Persistence | In-memory or SQLite on disk | Fully managed cloud | Postgres storage | Manual save/load |
| Horizontal scaling | Single node only | Fully distributed | Via Postgres replicas | No built-in clustering |
| Managed cloud option | Chroma Cloud (2025+) | Core product | Neon, Supabase, etc. | None |
| Best fit | Prototyping, local dev, RAG demos | Production at any scale | You already use Postgres | Research, offline batch |
The recommended migration path that many teams follow is: Chroma for prototyping (zero friction, works on a laptop), then evaluate a managed service like Pinecone or Chroma Cloud once you know your dataset size and query volume. Chroma's API is intentionally similar to other vector stores, so migrating your collection code is usually less than an hour of work.
Going deeper
Swap the embedding function
Chroma's default all-MiniLM-L6-v2 model is convenient but produces 384-dimensional vectors trained on general English text. For domain-specific corpora — code, medical records, multilingual content — you will get meaningfully better retrieval by swapping the embedding function. Chroma ships built-in wrappers for OpenAI, Cohere, Hugging Face, and Google Generative AI embeddings. Pass the wrapper when creating a collection and Chroma handles the rest:
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
import chromadb
client = chromadb.PersistentClient(path="./chroma_db")
ef = OpenAIEmbeddingFunction(
api_key="sk-...",
model_name="text-embedding-3-small",
)
collection = client.get_or_create_collection(
name="code_docs",
embedding_function=ef,
metadata={"hnsw:space": "cosine"}, # cosine works well for OpenAI embeddings
)HNSW tuning knobs
For latency-sensitive applications you can tune the HNSW index at collection-creation time via the metadata dictionary. The two most useful parameters are hnsw:M (number of connections per node, default 16 — higher values mean better recall but more memory) and hnsw:ef_construction (nodes explored during index build, default 100 — higher means better index quality at insert time). These cannot be changed after the collection is created, so set them before you start ingesting data.
The get_or_create_collection pattern
In production-like scripts you almost always want get_or_create_collection instead of create_collection. The former is idempotent: it creates the collection on the first run and silently returns the existing one on every subsequent run. Using create_collection will throw an error if the collection already exists, which breaks restarts and re-deployments.
Chroma Cloud for when you outgrow a single node
Chroma Inc. launched Chroma Cloud as a generally available managed service in 2025. It runs the same API as the open-source library against a distributed backend — so your existing collection.add() and collection.query() code works unchanged. Key additions include serverless billing (no nodes to provision), hybrid search combining vector similarity with metadata filtering, multi-region deployment across AWS and GCP, and SOC 2 Type II compliance for teams with security requirements. The migration path is intentionally simple: replace EphemeralClient() or PersistentClient() with a cloud HttpClient pointed at your Chroma Cloud tenant, and the rest of your code is untouched.
What Chroma is not
Understanding the limits keeps you from hitting them by surprise. Chroma does not support: full-text BM25 keyword search (use Elasticsearch or Typesense for that), ACID transactions across multiple collections, relational joins, or row-level access control. It is a vector store, not a general-purpose database. For applications that need both semantic retrieval and rich relational queries, many teams use Chroma (or pgvector) alongside a traditional relational database rather than instead of one.
FAQ
Is ChromaDB free to use?
Yes. The open-source library (pip install chromadb) is free and Apache-2.0 licensed. Chroma Cloud, the managed hosted service launched in 2025, has a free tier and usage-based pricing for larger workloads.
Does Chroma work with JavaScript or TypeScript?
Yes. Chroma ships an official JavaScript/TypeScript client (npm install chromadb) that implements the same collection API. It connects to a running Chroma server over HTTP — there is no in-process mode for JS the way there is for Python.
How many vectors can Chroma handle on a single node?
Chroma's official guidance is that a single node can comfortably support tens of millions of vectors on appropriate hardware. Beyond roughly 50 million vectors, or under heavy concurrent write load, you should evaluate distributed alternatives like Chroma Cloud, Pinecone, or Milvus.
Do I need a GPU to run Chroma?
No. The default all-MiniLM-L6-v2 embedding model runs on CPU via ONNX Runtime and is fast enough for prototyping on a laptop. If you use a heavier embedding model (e.g. a large Hugging Face model), a GPU will speed up embedding generation, but it is never required just to run Chroma itself.
Can I use my own embeddings instead of letting Chroma generate them?
Yes. Pass a pre-computed embeddings list to collection.add() alongside your documents and ids. Chroma will store them as-is. This is the typical approach when you have already generated embeddings with an external service like OpenAI and want to avoid re-computing them.
What happens to my data when I use EphemeralClient?
All data is stored in RAM and is lost the moment your Python process exits. Use PersistentClient(path=...) if you want data to survive restarts. The API is identical — switching between the two is a one-line change.