FAISS

Library for efficient similarity search and clustering of dense vectors

github.com/facebookresearch/faiss★ 40.4k faiss.ai

Overview

FAISS (Facebook AI Similarity Search) is a library for efficient similarity search and clustering of dense vectors. It is written in C++ with full Python and numpy wrappers, and is developed primarily by Meta's Fundamental AI Research group. You give it a set of vectors, and it finds the ones closest to a query using L2 (Euclidean) distance or dot product, which also covers cosine similarity on normalized vectors.

It is built for developers working with embeddings, such as semantic search, recommendation, and retrieval-augmented generation. As an embedded, in-process library rather than a standalone server, you call it directly from your own code and keep the index in memory or on disk. This makes it a common building block underneath higher-level vector databases.

FAISS offers a range of index types so you can trade off search speed, accuracy, and memory. Exact indexes like IndexFlatL2 give precise results, while compressed and graph-based indexes (such as HNSW and NSG) scale to billions of vectors. Some of the algorithms also run on the GPU, where GPU indexes can act as drop-in replacements for their CPU counterparts.

What it does

Exact and approximate nearest-neighbor search over dense vectors using L2 distance or dot product
Many index types that trade off search time, accuracy, and memory per vector
Compressed and quantized indexes that scale to billions of vectors in RAM on a single server
Graph-based indexes such as HNSW and NSG for faster approximate search
Optional GPU implementation (CUDA or AMD ROCm) where GPU indexes drop in for CPU ones, with single and multi-GPU support
C++ core with complete Python/numpy wrappers, plus k-means clustering and parameter-tuning utilities

Getting started

FAISS ships precompiled for Anaconda. Install the CPU package, then build a flat index and search it. The README points to the project wiki for the full tutorial.

Install with conda

Install the CPU build from the pytorch channel. A GPU build (faiss-gpu) is also available if you have CUDA.

bashbash

conda install -c pytorch faiss-cpu

Build an index and search

Create an exact IndexFlatL2 index, add your vectors, then query for the nearest neighbors. This uses the IndexFlatL2 class referenced in the README; see the project wiki for the full getting-started tutorial.

pythonpython

import numpy as np
import faiss

d = 64                           # vector dimension
xb = np.random.random((10000, d)).astype('float32')  # database
xq = np.random.random((5, d)).astype('float32')      # queries

index = faiss.IndexFlatL2(d)     # exact L2 index
index.add(xb)                    # add vectors

D, I = index.search(xq, k=4)     # 4 nearest neighbors
print(I)                          # neighbor ids
print(D)                          # squared L2 distances

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Powering semantic search over embeddings, where you find the documents closest to a query vector
Serving as the in-process retrieval layer for a retrieval-augmented generation (RAG) pipeline
Running nearest-neighbor search at scale, up to billions of vectors, using compressed or GPU indexes
Clustering dense vectors with the built-in k-means implementation

How FAISS compares

FAISS alongside other open-source vector databases tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Supabase	★ 105k	Managed Postgres backend whose Vector toolkit (pgvector) stores, indexes, and queries embeddings next to transactional data.
Redis Cloud	★ 75k	Fully-managed Redis with built-in vector search, offering low-latency similarity and hybrid queries over any embeddings.
Milvus	★ 44.9k	A distributed vector database for storing and searching billions of embeddings at scale, with multiple index types and Kubernetes-native deployment.
FAISS	★ 40.4k	Library for efficient similarity search and clustering of dense vectors
Qdrant	★ 32.5k	A Rust-based vector search engine that stores embeddings with rich payload filtering for semantic search and recommendation systems.
Chroma	★ 28.5k	A developer-focused vector database designed for quickly building retrieval and RAG features with a simple Python and JavaScript API.
pgvector	★ 21.8k	A PostgreSQL extension that adds a vector data type and similarity search so you can store and query embeddings inside an existing Postgres database.
Weaviate	★ 16.4k	A vector database with built-in hybrid search and LLM-provider integrations for building semantic search and retrieval applications.

// Overview

// What it does

// Getting started