What Is Chroma? The Easiest Vector Store for Prototyping

Q: Is ChromaDB free to use?

Yes. The open-source library (`pip install chromadb`) is free and Apache-2.0 licensed. Chroma Cloud, the managed hosted service launched in 2025, has a free tier and usage-based pricing for larger workloads.

Q: Do I need a GPU to run Chroma?

No. The default `all-MiniLM-L6-v2` embedding model runs on CPU via ONNX Runtime and is fast enough for prototyping on a laptop. If you use a heavier embedding model (e.g. a large Hugging Face model), a GPU will speed up embedding generation, but it is never required just to run Chroma itself.

Q: Can I use my own embeddings instead of letting Chroma generate them?

Yes. Pass a pre-computed `embeddings` list to `collection.add()` alongside your `documents` and `ids`. Chroma will store them as-is. This is the typical approach when you have already generated embeddings with an external service like OpenAI and want to avoid re-computing them.

Spin up vector search in three lines of Python and learn where Chroma shines — and where it stops scaling.

BEGINNER10 MIN READUPDATED 2026-06-12

In plain English

Chroma (also written ChromaDB) is an open-source vector store that runs directly inside your Python process. You pip install chromadb, write three lines of code, and you have a fully working similarity-search engine on your laptop — no server to start, no cloud account to create, no infrastructure to manage.

Chroma — diagram — Chroma — kalilinuxtutorials.com

The best analogy is a sticky-note board for meaning. Every piece of text you add gets converted to a cluster of numbers — its embedding — that captures what it's about. Chroma pins those clusters onto an invisible coordinate board. When you ask a question, Chroma converts it to another cluster and finds the pins closest to it. That's semantic search: finding ideas near yours, not just words that match.

Because it runs in-process, Chroma is the fastest way to try that idea out. You don't ship anything or pay anything. When your prototype graduates to production, you can swap Chroma out for a managed service — but for the first 90% of development, it's unbeatable for speed.

Why it matters for builders

Before Chroma and similar tools existed, adding semantic search to a project meant spinning up Elasticsearch, Weaviate, or a cloud vector service — all of which require configuration, credentials, and operational overhead before you can write your first query. That setup cost killed a lot of experiments before they started.

Chroma removed that barrier. The killer use cases it unlocks for prototypers are:

RAG (Retrieval-Augmented Generation) — store your documents as embeddings and retrieve the most relevant chunks before calling an LLM, so the model answers from your data instead of hallucinating.
Semantic search — let users query a knowledge base, product catalog, or support corpus in natural language.
Duplicate / near-duplicate detection — find items that mean the same thing even when phrased differently.
Long-term agent memory — give an AI agent a persistent store of facts it can look up during a conversation.
Recommendation — surface items that are conceptually similar to what a user is currently viewing.

The common thread is: you need to search by meaning, not by keyword, and you need it working today. Chroma is purpose-built for that scenario.

How Chroma works under the hood

Chroma's architecture has four layers that work together every time you add a document or run a query.

// Chroma request lifecycle

Your textraw string passed to collection.add() or collection.query()Embedding functionconverts text to a float vector (default: all-MiniLM-L6-v2, 384 dims)HNSW indexapproximate nearest-neighbour graph stored in memory + on diskSQLite metadata storedocuments, IDs, and metadata fields persisted to chroma.sqlite3Resultstop-k documents + distances + metadata returned to your code

Collections

The top-level concept in Chroma is a collection — think of it as a table in a relational database, but one that only holds vectors. Each item in a collection has four parts: a unique id, the raw document text, a metadata dictionary of arbitrary key-value pairs, and optionally the pre-computed embedding (Chroma generates it for you if you omit it).

The HNSW index

Chroma uses HNSW (Hierarchical Navigable Small World), an approximate nearest-neighbour algorithm, to answer similarity queries in sub-linear time. HNSW builds a layered graph where each node connects to its closest neighbours; at query time it navigates the graph to find approximate top-k matches without checking every vector. The tradeoff is that results are approximate — extremely close but not guaranteed optimal — in exchange for speed that stays fast even as the collection grows to millions of items.

The default distance metric is L2 (squared Euclidean distance). You can choose cosine or ip (inner product) when you create a collection, but you cannot change the metric after creation — plan ahead.

Three client modes

Chroma ships three client classes that cover the full spectrum from throwaway script to multi-process server:

Client class	Where data lives	Survives restart?	Best for
`EphemeralClient()`	RAM only	No	Unit tests, quick experiments
`PersistentClient(path=...)`	Disk (SQLite + HNSW files)	Yes	Local dev, single-user apps
`HttpClient(host=...)`	Separate Chroma server process	Yes (server-side)	Multi-client or containerised setups

Quick start: from zero to semantic search

Here is the minimal path from a fresh environment to a working similarity query. The whole thing takes about two minutes.

bashbash

pip install chromadb

The install pulls in the onnxruntime package and the all-MiniLM-L6-v2 model weights so Chroma can generate embeddings locally without any API key.

pythonpython

import chromadb

# In-memory client — data disappears when the script exits
client = chromadb.EphemeralClient()

# Create a collection (like a table)
collection = client.create_collection(name="docs")

# Add documents — Chroma embeds them automatically
collection.add(
    documents=[
        "Transformers use self-attention to model long-range dependencies.",
        "A convolutional network slides filters over a 2D grid.",
        "Reinforcement learning agents learn by maximising a reward signal.",
        "Diffusion models generate images by gradually denoising random noise.",
    ],
    ids=["t1", "t2", "t3", "t4"],
)

# Query by natural language
results = collection.query(
    query_texts=["how do language models pay attention?"],
    n_results=2,
)

print(results["documents"])
# [['Transformers use self-attention to model long-range dependencies.',
#   'Diffusion models generate images by gradually denoising random noise.']]

Adding metadata and filtering

Real applications rarely want pure semantic search. You almost always need to combine meaning-based ranking with structured constraints — "find me the most relevant article, but only from the last 30 days" or "only search docs tagged as public." Chroma handles this with the where clause, which filters on the metadata dictionary you attach to each document.

pythonpython

collection.add(
    documents=[
        "GPT-4 was released by OpenAI in March 2023.",
        "Gemini 1.5 Pro was released by Google in February 2024.",
        "Llama 3 was released by Meta in April 2024.",
    ],
    ids=["gpt4", "gemini", "llama3"],
    metadatas=[
        {"vendor": "openai", "year": 2023},
        {"vendor": "google", "year": 2024},
        {"vendor": "meta", "year": 2024},
    ],
)

# Semantic query restricted to 2024 releases only
results = collection.query(
    query_texts=["multimodal large language model"],
    n_results=2,
    where={"year": {"$gte": 2024}},  # metadata filter
)

Supported operators include $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $contains, and logical combinators $and / $or. These compose freely, so you can express complex filters without a separate SQL query.

Chroma vs the alternatives

Chroma is one of several vector store options, each optimised for a different point on the scale-vs-simplicity curve. Here is how it fits relative to the tools most developers encounter first.

	Chroma	Pinecone	pgvector	FAISS
Setup	pip install, zero config	Cloud account, API key	Postgres extension	C++ library, Python wrapper
Runs in-process	Yes	No (cloud API)	No (Postgres server)	Yes
Persistence	In-memory or SQLite on disk	Fully managed cloud	Postgres storage	Manual save/load
Horizontal scaling	Single node only	Fully distributed	Via Postgres replicas	No built-in clustering
Managed cloud option	Chroma Cloud (2025+)	Core product	Neon, Supabase, etc.	None
Best fit	Prototyping, local dev, RAG demos	Production at any scale	You already use Postgres	Research, offline batch

The recommended migration path that many teams follow is: Chroma for prototyping (zero friction, works on a laptop), then evaluate a managed service like Pinecone or Chroma Cloud once you know your dataset size and query volume. Chroma's API is intentionally similar to other vector stores, so migrating your collection code is usually less than an hour of work.

Going deeper

Swap the embedding function

Chroma's default all-MiniLM-L6-v2 model is convenient but produces 384-dimensional vectors trained on general English text. For domain-specific corpora — code, medical records, multilingual content — you will get meaningfully better retrieval by swapping the embedding function. Chroma ships built-in wrappers for OpenAI, Cohere, Hugging Face, and Google Generative AI embeddings. Pass the wrapper when creating a collection and Chroma handles the rest:

pythonpython

from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
import chromadb

client = chromadb.PersistentClient(path="./chroma_db")

ef = OpenAIEmbeddingFunction(
    api_key="sk-...",
    model_name="text-embedding-3-small",
)

collection = client.get_or_create_collection(
    name="code_docs",
    embedding_function=ef,
    metadata={"hnsw:space": "cosine"},  # cosine works well for OpenAI embeddings
)

HNSW tuning knobs

For latency-sensitive applications you can tune the HNSW index at collection-creation time via the metadata dictionary. The two most useful parameters are hnsw:M (number of connections per node, default 16 — higher values mean better recall but more memory) and hnsw:ef_construction (nodes explored during index build, default 100 — higher means better index quality at insert time). These cannot be changed after the collection is created, so set them before you start ingesting data.

The get_or_create_collection pattern

In production-like scripts you almost always want get_or_create_collection instead of create_collection. The former is idempotent: it creates the collection on the first run and silently returns the existing one on every subsequent run. Using create_collection will throw an error if the collection already exists, which breaks restarts and re-deployments.

Chroma Cloud for when you outgrow a single node

Chroma Inc. launched Chroma Cloud as a generally available managed service in 2025. It runs the same API as the open-source library against a distributed backend — so your existing collection.add() and collection.query() code works unchanged. Key additions include serverless billing (no nodes to provision), hybrid search combining vector similarity with metadata filtering, multi-region deployment across AWS and GCP, and SOC 2 Type II compliance for teams with security requirements. The migration path is intentionally simple: replace EphemeralClient() or PersistentClient() with a cloud HttpClient pointed at your Chroma Cloud tenant, and the rest of your code is untouched.

What Chroma is not

Understanding the limits keeps you from hitting them by surprise. Chroma does not support: full-text BM25 keyword search (use Elasticsearch or Typesense for that), ACID transactions across multiple collections, relational joins, or row-level access control. It is a vector store, not a general-purpose database. For applications that need both semantic retrieval and rich relational queries, many teams use Chroma (or pgvector) alongside a traditional relational database rather than instead of one.

FAQ

Is ChromaDB free to use?

Yes. The open-source library (pip install chromadb) is free and Apache-2.0 licensed. Chroma Cloud, the managed hosted service launched in 2025, has a free tier and usage-based pricing for larger workloads.

Does Chroma work with JavaScript or TypeScript?

Yes. Chroma ships an official JavaScript/TypeScript client (npm install chromadb) that implements the same collection API. It connects to a running Chroma server over HTTP — there is no in-process mode for JS the way there is for Python.

How many vectors can Chroma handle on a single node?

Chroma's official guidance is that a single node can comfortably support tens of millions of vectors on appropriate hardware. Beyond roughly 50 million vectors, or under heavy concurrent write load, you should evaluate distributed alternatives like Chroma Cloud, Pinecone, or Milvus.

Do I need a GPU to run Chroma?

No. The default all-MiniLM-L6-v2 embedding model runs on CPU via ONNX Runtime and is fast enough for prototyping on a laptop. If you use a heavier embedding model (e.g. a large Hugging Face model), a GPU will speed up embedding generation, but it is never required just to run Chroma itself.

Can I use my own embeddings instead of letting Chroma generate them?

Yes. Pass a pre-computed embeddings list to collection.add() alongside your documents and ids. Chroma will store them as-is. This is the typical approach when you have already generated embeddings with an external service like OpenAI and want to avoid re-computing them.

What happens to my data when I use EphemeralClient?

All data is stored in RAM and is lost the moment your Python process exits. Use PersistentClient(path=...) if you want data to survive restarts. The API is identical — switching between the two is a one-line change.

// In plain English

// Why it matters for builders

// How Chroma works under the hood

Collections

The HNSW index

Three client modes

// Quick start: from zero to semantic search

Adding metadata and filtering

// Chroma vs the alternatives

// Going deeper

Swap the embedding function

HNSW tuning knobs

The get_or_create_collection pattern

Chroma Cloud for when you outgrow a single node

What Chroma is not

// FAQ

// Further reading

// Related

In plain English

Why it matters for builders

How Chroma works under the hood

Quick start: from zero to semantic search

Chroma vs the alternatives

Going deeper

FAQ

Further reading

Related