txtai

All-in-one embeddings database for semantic search, LLM orchestration, and RAG

github.com/neuml/txtai★ 12.7k neuml.github.io/txtai

Overview

txtai is an open-source AI framework built around an embeddings database that combines vector indexes (sparse and dense), graph networks, and relational databases. You can index text, documents, audio, images, and video, then run vector search over them or use the index as a knowledge source for LLM applications.

It is aimed at Python developers who want to build semantic search, retrieval augmented generation (RAG), and multi-model workflows without wiring together several separate services. Pipelines wrap language-model tasks such as question answering, summarization, transcription, and translation, and workflows join those pipelines into larger processes.

Within the RAG frameworks category, txtai bundles the vector store, retrieval, pipelines, and agents into one package. It runs locally so your data stays on your machine, and it ships with sensible defaults plus a web and Model Context Protocol (MCP) API for use from other languages.

What it does

Embeddings database that unites sparse and dense vector indexes, graph networks, and relational storage
Vector search with SQL, object storage, topic modeling, and graph analysis, plus multimodal indexing across text, documents, audio, images, and video
Pipelines for LLM prompts, question answering, labeling, transcription, translation, and summarization
Workflows that chain pipelines together to build microservices or multi-model processes
Agents that connect embeddings, pipelines, workflows, and other agents to solve multi-step tasks
Web and Model Context Protocol (MCP) APIs with bindings for JavaScript, Java, Rust, and Go

Getting started

Install txtai with pip, then build an embeddings index and run a search in a few lines of Python.

Install txtai

Install the package from PyPI. txtai requires Python 3.10 or later.

bashbash

pip install txtai

Index and search

Create an Embeddings instance, index a few records, and run a semantic search.

pythonpython

import txtai

embeddings = txtai.Embeddings()
embeddings.index(["Correct", "Not what we hoped"])
embeddings.search("positive", 1)
#[(0, 0.29862046241760254)]

Serve it as an API (optional)

Define an embeddings config in a YAML file and run it behind the built-in FastAPI service.

yamlyaml

# app.yml
embeddings:
    path: sentence-transformers/all-MiniLM-L6-v2

Start the API server

Launch the API with uvicorn and query it over HTTP.

bashbash

CONFIG=app.yml uvicorn "txtai.api:app"
curl -X GET "http://localhost:8000/search?query=positive"

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Build a semantic or similarity search application that matches on meaning instead of exact keywords
Power a RAG pipeline by using the embeddings database as the retrieval source for an LLM
Run language-model pipelines for summarization, translation, transcription, or question answering
Chain pipelines into workflows or autonomous agents that solve multi-step tasks

How txtai compares

txtai alongside other open-source rag frameworks & platforms tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Dify	★ 146k	An open-source platform with a visual workflow builder for creating LLM and RAG applications without writing much code.
RAGFlow	★ 83.2k	A RAG engine built around deep document understanding that turns complex files into a grounded, citation-backed question-answering layer.
Context7	★ 57.7k	Context7 pulls current, version-specific documentation and code examples for any library and feeds them into your LLM, available as a CLI skill or an MCP server.
Quivr	★ 39.2k	Quivr is an open-source RAG framework that ingests your documents and answers questions about them, working with any LLM and any file type.
LightRAG	★ 36.8k	A graph-based RAG system that builds an entity-and-relationship knowledge graph for fast retrieval and easy incremental updates.
GraphRAG	★ 33.9k	Microsoft's graph-based RAG system that extracts a knowledge graph from documents to answer broad, multi-document questions.
PageIndex	★ 33.2k	PageIndex turns long PDFs into a table-of-contents tree and uses LLM reasoning to retrieve relevant sections, with no vector database and no chunking.
txtai	★ 12.7k	All-in-one embeddings database for semantic search, LLM orchestration, and RAG

// Overview

// What it does

// Getting started