Chroma

Open-source embedding database for building retrieval and RAG features

github.com/chroma-core/chroma★ 28.5k trychroma.com

Overview

Chroma is an open-source vector database for storing embeddings and running similarity search over them. Its core API is small (around four functions), so you can create a collection, add documents, and query the closest matches without much setup. Chroma can handle tokenization, embedding, and indexing for you, or you can supply your own embeddings.

It targets developers who are adding retrieval or RAG features to an application and want to start quickly. You can run it in-memory for prototyping, add persistence when you need it, or run it in client-server mode. Clients are available for both Python and JavaScript.

Within the vector database category, Chroma leans toward a simple developer experience over heavy configuration. For teams that prefer not to self-host, there is a hosted option, Chroma Cloud, but the project itself runs locally and is licensed under Apache 2.0.

What it does

Small core API: create_collection, add, query, and get cover the common workflow
Automatic tokenization, embedding, and indexing, or bring your own embeddings
Metadata and full-text filters via where and where_document on queries
Runs in-memory for prototyping, with persistence and client-server modes available
Python and JavaScript clients from a single chromadb package
Optional Chroma Cloud hosted service for serverless vector, hybrid, and full-text search

Getting started

Install the client, then create a collection, add a few documents, and query for the closest matches.

Install the client

Install the Python client with pip, or the JavaScript client with npm. For client-server mode you can run a Chroma server against a database path.

bashbash

pip install chromadb # python client
# for javascript, npm install chromadb!
# for client-server mode, chroma run --path /chroma_db_path

Create a collection and add documents

Start an in-memory client for prototyping, create a collection, and add documents with metadata and ids. Chroma handles embedding and indexing automatically.

pythonpython

import chromadb
# setup Chroma in-memory, for easy prototyping. Can add persistence easily!
client = chromadb.Client()

# Create collection. get_collection, get_or_create_collection, delete_collection also available!
collection = client.create_collection("all-my-documents")

# Add docs to the collection. Can also update and delete. Row-based API coming soon!
collection.add(
    documents=["This is document1", "This is document2"], # we handle tokenization, embedding, and indexing automatically. You can skip that and add your own embeddings as well
    metadatas=[{"source": "notion"}, {"source": "google-docs"}], # filter on these!
    ids=["doc1", "doc2"], # unique for each doc
)

Query for similar results

Search for the most similar documents to a query, with optional metadata or document-text filters.

pythonpython

results = collection.query(
    query_texts=["This is a query document"],
    n_results=2,
    # where={"metadata_field": "is_equal_to_this"}, # optional filter
    # where_document={"$contains":"search_string"}  # optional filter
)

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Build a RAG pipeline that retrieves relevant context before calling an LLM
Add semantic search over documents, notes, or product data in an app
Prototype retrieval features in-memory, then move to persistence or client-server mode
Filter search results by metadata fields such as source or category

How Chroma compares

Chroma alongside other open-source vector databases tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Supabase	★ 105k	Managed Postgres backend whose Vector toolkit (pgvector) stores, indexes, and queries embeddings next to transactional data.
Redis Cloud	★ 75k	Fully-managed Redis with built-in vector search, offering low-latency similarity and hybrid queries over any embeddings.
Milvus	★ 44.9k	A distributed vector database for storing and searching billions of embeddings at scale, with multiple index types and Kubernetes-native deployment.
FAISS	★ 40.4k	A library from Meta for efficient similarity search and clustering of dense vectors, with both exact and approximate indexes.
Qdrant	★ 32.5k	A Rust-based vector search engine that stores embeddings with rich payload filtering for semantic search and recommendation systems.
Chroma	★ 28.5k	Open-source embedding database for building retrieval and RAG features
pgvector	★ 21.8k	A PostgreSQL extension that adds a vector data type and similarity search so you can store and query embeddings inside an existing Postgres database.
Weaviate	★ 16.4k	A vector database with built-in hybrid search and LLM-provider integrations for building semantic search and retrieval applications.

// Overview

// What it does

// Getting started