AI/TLDR

What Is Chroma? The Easiest Vector Store for Prototyping

Spin up vector search in three lines of Python and learn where Chroma shines — and where it stops scaling.

BEGINNER10 MIN READUPDATED 2026-06-12

In plain English

Chroma (also written ChromaDB) is an open-source vector store that runs directly inside your Python process. You pip install chromadb, write three lines of code, and you have a fully working similarity-search engine on your laptop — no server to start, no cloud account to create, no infrastructure to manage.

Chroma — diagram
Chroma — kalilinuxtutorials.com

The best analogy is a sticky-note board for meaning. Every piece of text you add gets converted to a cluster of numbers — its embedding — that captures what it's about. Chroma pins those clusters onto an invisible coordinate board. When you ask a question, Chroma converts it to another cluster and finds the pins closest to it. That's semantic search: finding ideas near yours, not just words that match.

Because it runs in-process, Chroma is the fastest way to try that idea out. You don't ship anything or pay anything. When your prototype graduates to production, you can swap Chroma out for a managed service — but for the first 90% of development, it's unbeatable for speed.

Why it matters for builders

Before Chroma and similar tools existed, adding semantic search to a project meant spinning up Elasticsearch, Weaviate, or a cloud vector service — all of which require configuration, credentials, and operational overhead before you can write your first query. That setup cost killed a lot of experiments before they started.

Chroma removed that barrier. The killer use cases it unlocks for prototypers are:

  • RAG (Retrieval-Augmented Generation) — store your documents as embeddings and retrieve the most relevant chunks before calling an LLM, so the model answers from your data instead of hallucinating.
  • Semantic search — let users query a knowledge base, product catalog, or support corpus in natural language.
  • Duplicate / near-duplicate detection — find items that mean the same thing even when phrased differently.
  • Long-term agent memory — give an AI agent a persistent store of facts it can look up during a conversation.
  • Recommendation — surface items that are conceptually similar to what a user is currently viewing.

The common thread is: you need to search by meaning, not by keyword, and you need it working today. Chroma is purpose-built for that scenario.

How Chroma works under the hood

Chroma's architecture has four layers that work together every time you add a document or run a query.

Collections

The top-level concept in Chroma is a collection — think of it as a table in a relational database, but one that only holds vectors. Each item in a collection has four parts: a unique id, the raw document text, a metadata dictionary of arbitrary key-value pairs, and optionally the pre-computed embedding (Chroma generates it for you if you omit it).

The HNSW index

Chroma uses HNSW (Hierarchical Navigable Small World), an approximate nearest-neighbour algorithm, to answer similarity queries in sub-linear time. HNSW builds a layered graph where each node connects to its closest neighbours; at query time it navigates the graph to find approximate top-k matches without checking every vector. The tradeoff is that results are approximate — extremely close but not guaranteed optimal — in exchange for speed that stays fast even as the collection grows to millions of items.

The default distance metric is L2 (squared Euclidean distance). You can choose cosine or ip (inner product) when you create a collection, but you cannot change the metric after creation — plan ahead.

Three client modes

Chroma ships three client classes that cover the full spectrum from throwaway script to multi-process server:

Client classWhere data livesSurvives restart?Best for
EphemeralClient()RAM onlyNoUnit tests, quick experiments
PersistentClient(path=...)Disk (SQLite + HNSW files)YesLocal dev, single-user apps
HttpClient(host=...)Separate Chroma server processYes (server-side)Multi-client or containerised setups

Quick start: from zero to semantic search

Here is the minimal path from a fresh environment to a working similarity query. The whole thing takes about two minutes.

bashbash
pip install chromadb

The install pulls in the onnxruntime package and the all-MiniLM-L6-v2 model weights so Chroma can generate embeddings locally without any API key.

pythonpython
import chromadb

# In-memory client — data disappears when the script exits
client = chromadb.EphemeralClient()

# Create a collection (like a table)
collection = client.create_collection(name="docs")

# Add documents — Chroma embeds them automatically
collection.add(
    documents=[
        "Transformers use self-attention to model long-range dependencies.",
        "A convolutional network slides filters over a 2D grid.",
        "Reinforcement learning agents learn by maximising a reward signal.",
        "Diffusion models generate images by gradually denoising random noise.",
    ],
    ids=["t1", "t2", "t3", "t4"],
)

# Query by natural language
results = collection.query(
    query_texts=["how do language models pay attention?"],
    n_results=2,
)

print(results["documents"])
# [['Transformers use self-attention to model long-range dependencies.',
#   'Diffusion models generate images by gradually denoising random noise.']]

Adding metadata and filtering

Real applications rarely want pure semantic search. You almost always need to combine meaning-based ranking with structured constraints — "find me the most relevant article, but only from the last 30 days" or "only search docs tagged as public." Chroma handles this with the where clause, which filters on the metadata dictionary you attach to each document.

pythonpython
collection.add(
    documents=[
        "GPT-4 was released by OpenAI in March 2023.",
        "Gemini 1.5 Pro was released by Google in February 2024.",
        "Llama 3 was released by Meta in April 2024.",
    ],
    ids=["gpt4", "gemini", "llama3"],
    metadatas=[
        {"vendor": "openai", "year": 2023},
        {"vendor": "google", "year": 2024},
        {"vendor": "meta", "year": 2024},
    ],
)

# Semantic query restricted to 2024 releases only
results = collection.query(
    query_texts=["multimodal large language model"],
    n_results=2,
    where={"year": {"$gte": 2024}},  # metadata filter
)

Supported operators include $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $contains, and logical combinators $and / $or. These compose freely, so you can express complex filters without a separate SQL query.

Chroma vs the alternatives

Chroma is one of several vector store options, each optimised for a different point on the scale-vs-simplicity curve. Here is how it fits relative to the tools most developers encounter first.

ChromaPineconepgvectorFAISS
Setuppip install, zero configCloud account, API keyPostgres extensionC++ library, Python wrapper
Runs in-processYesNo (cloud API)No (Postgres server)Yes
PersistenceIn-memory or SQLite on diskFully managed cloudPostgres storageManual save/load
Horizontal scalingSingle node onlyFully distributedVia Postgres replicasNo built-in clustering
Managed cloud optionChroma Cloud (2025+)Core productNeon, Supabase, etc.None
Best fitPrototyping, local dev, RAG demosProduction at any scaleYou already use PostgresResearch, offline batch

The recommended migration path that many teams follow is: Chroma for prototyping (zero friction, works on a laptop), then evaluate a managed service like Pinecone or Chroma Cloud once you know your dataset size and query volume. Chroma's API is intentionally similar to other vector stores, so migrating your collection code is usually less than an hour of work.

Going deeper

Swap the embedding function

Chroma's default all-MiniLM-L6-v2 model is convenient but produces 384-dimensional vectors trained on general English text. For domain-specific corpora — code, medical records, multilingual content — you will get meaningfully better retrieval by swapping the embedding function. Chroma ships built-in wrappers for OpenAI, Cohere, Hugging Face, and Google Generative AI embeddings. Pass the wrapper when creating a collection and Chroma handles the rest:

pythonpython
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
import chromadb

client = chromadb.PersistentClient(path="./chroma_db")

ef = OpenAIEmbeddingFunction(
    api_key="sk-...",
    model_name="text-embedding-3-small",
)

collection = client.get_or_create_collection(
    name="code_docs",
    embedding_function=ef,
    metadata={"hnsw:space": "cosine"},  # cosine works well for OpenAI embeddings
)

HNSW tuning knobs

For latency-sensitive applications you can tune the HNSW index at collection-creation time via the metadata dictionary. The two most useful parameters are hnsw:M (number of connections per node, default 16 — higher values mean better recall but more memory) and hnsw:ef_construction (nodes explored during index build, default 100 — higher means better index quality at insert time). These cannot be changed after the collection is created, so set them before you start ingesting data.

The get_or_create_collection pattern

In production-like scripts you almost always want get_or_create_collection instead of create_collection. The former is idempotent: it creates the collection on the first run and silently returns the existing one on every subsequent run. Using create_collection will throw an error if the collection already exists, which breaks restarts and re-deployments.

Chroma Cloud for when you outgrow a single node

Chroma Inc. launched Chroma Cloud as a generally available managed service in 2025. It runs the same API as the open-source library against a distributed backend — so your existing collection.add() and collection.query() code works unchanged. Key additions include serverless billing (no nodes to provision), hybrid search combining vector similarity with metadata filtering, multi-region deployment across AWS and GCP, and SOC 2 Type II compliance for teams with security requirements. The migration path is intentionally simple: replace EphemeralClient() or PersistentClient() with a cloud HttpClient pointed at your Chroma Cloud tenant, and the rest of your code is untouched.

What Chroma is not

Understanding the limits keeps you from hitting them by surprise. Chroma does not support: full-text BM25 keyword search (use Elasticsearch or Typesense for that), ACID transactions across multiple collections, relational joins, or row-level access control. It is a vector store, not a general-purpose database. For applications that need both semantic retrieval and rich relational queries, many teams use Chroma (or pgvector) alongside a traditional relational database rather than instead of one.

FAQ

Is ChromaDB free to use?

Yes. The open-source library (pip install chromadb) is free and Apache-2.0 licensed. Chroma Cloud, the managed hosted service launched in 2025, has a free tier and usage-based pricing for larger workloads.

Does Chroma work with JavaScript or TypeScript?

Yes. Chroma ships an official JavaScript/TypeScript client (npm install chromadb) that implements the same collection API. It connects to a running Chroma server over HTTP — there is no in-process mode for JS the way there is for Python.

How many vectors can Chroma handle on a single node?

Chroma's official guidance is that a single node can comfortably support tens of millions of vectors on appropriate hardware. Beyond roughly 50 million vectors, or under heavy concurrent write load, you should evaluate distributed alternatives like Chroma Cloud, Pinecone, or Milvus.

Do I need a GPU to run Chroma?

No. The default all-MiniLM-L6-v2 embedding model runs on CPU via ONNX Runtime and is fast enough for prototyping on a laptop. If you use a heavier embedding model (e.g. a large Hugging Face model), a GPU will speed up embedding generation, but it is never required just to run Chroma itself.

Can I use my own embeddings instead of letting Chroma generate them?

Yes. Pass a pre-computed embeddings list to collection.add() alongside your documents and ids. Chroma will store them as-is. This is the typical approach when you have already generated embeddings with an external service like OpenAI and want to avoid re-computing them.

What happens to my data when I use EphemeralClient?

All data is stored in RAM and is lost the moment your Python process exits. Use PersistentClient(path=...) if you want data to survive restarts. The API is identical — switching between the two is a one-line change.

Further reading