Overview
Chroma is an open-source vector database for storing embeddings and running similarity search over them. Its core API is small (around four functions), so you can create a collection, add documents, and query the closest matches without much setup. Chroma can handle tokenization, embedding, and indexing for you, or you can supply your own embeddings.
It targets developers who are adding retrieval or RAG features to an application and want to start quickly. You can run it in-memory for prototyping, add persistence when you need it, or run it in client-server mode. Clients are available for both Python and JavaScript.
Within the vector database category, Chroma leans toward a simple developer experience over heavy configuration. For teams that prefer not to self-host, there is a hosted option, Chroma Cloud, but the project itself runs locally and is licensed under Apache 2.0.
What it does
- Small core API: create_collection, add, query, and get cover the common workflow
- Automatic tokenization, embedding, and indexing, or bring your own embeddings
- Metadata and full-text filters via where and where_document on queries
- Runs in-memory for prototyping, with persistence and client-server modes available
- Python and JavaScript clients from a single chromadb package
- Optional Chroma Cloud hosted service for serverless vector, hybrid, and full-text search
Getting started
Install the client, then create a collection, add a few documents, and query for the closest matches.
Install the client
Install the Python client with pip, or the JavaScript client with npm. For client-server mode you can run a Chroma server against a database path.
pip install chromadb # python client
# for javascript, npm install chromadb!
# for client-server mode, chroma run --path /chroma_db_pathCreate a collection and add documents
Start an in-memory client for prototyping, create a collection, and add documents with metadata and ids. Chroma handles embedding and indexing automatically.
import chromadb
# setup Chroma in-memory, for easy prototyping. Can add persistence easily!
client = chromadb.Client()
# Create collection. get_collection, get_or_create_collection, delete_collection also available!
collection = client.create_collection("all-my-documents")
# Add docs to the collection. Can also update and delete. Row-based API coming soon!
collection.add(
documents=["This is document1", "This is document2"], # we handle tokenization, embedding, and indexing automatically. You can skip that and add your own embeddings as well
metadatas=[{"source": "notion"}, {"source": "google-docs"}], # filter on these!
ids=["doc1", "doc2"], # unique for each doc
)Query for similar results
Search for the most similar documents to a query, with optional metadata or document-text filters.
results = collection.query(
query_texts=["This is a query document"],
n_results=2,
# where={"metadata_field": "is_equal_to_this"}, # optional filter
# where_document={"$contains":"search_string"} # optional filter
)Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Build a RAG pipeline that retrieves relevant context before calling an LLM
- Add semantic search over documents, notes, or product data in an app
- Prototype retrieval features in-memory, then move to persistence or client-server mode
- Filter search results by metadata fields such as source or category
How Chroma compares
Chroma alongside other open-source vector databases tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Supabase | ★ 105k | Managed Postgres backend whose Vector toolkit (pgvector) stores, indexes, and queries embeddings next to transactional data. |
| Redis Cloud | ★ 75k | Fully-managed Redis with built-in vector search, offering low-latency similarity and hybrid queries over any embeddings. |
| Milvus | ★ 44.9k | A distributed vector database for storing and searching billions of embeddings at scale, with multiple index types and Kubernetes-native deployment. |
| FAISS | ★ 40.4k | A library from Meta for efficient similarity search and clustering of dense vectors, with both exact and approximate indexes. |
| Qdrant | ★ 32.5k | A Rust-based vector search engine that stores embeddings with rich payload filtering for semantic search and recommendation systems. |
| Chroma | ★ 28.5k | Open-source embedding database for building retrieval and RAG features |
| pgvector | ★ 21.8k | A PostgreSQL extension that adds a vector data type and similarity search so you can store and query embeddings inside an existing Postgres database. |
| Weaviate | ★ 16.4k | A vector database with built-in hybrid search and LLM-provider integrations for building semantic search and retrieval applications. |