Overview
ModernBERT is an updated take on the BERT encoder from Answer.AI, LightOn, and collaborators. It brings architecture changes and modern training to the encoder family, with support for long context and faster inference. This repository is the research codebase used for pre-training and GLUE evaluations; the ready-to-use models live in the ModernBERT collection on Hugging Face.
It is meant for developers and researchers who need a strong encoder backbone rather than a generative model. Encoders like ModernBERT turn text into dense vectors, which makes them a good fit for embeddings, retrieval, classification, and other tasks where you compare or rank text instead of generating it.
Within the embedding-models space, ModernBERT is most often used as a starting point you fine-tune. The repository ships example scripts for training dense retrieval models with Sentence Transformers and ColBERT models with PyLate, so teams can build their own embedding or retrieval systems on top of it.
What it does
- Modernized BERT architecture introducing FlexBERT, a modular approach to encoder building blocks configured through YAML files
- Long-context support and faster inference compared to the original BERT
- Uses Flash Attention 2 (and Flash Attention 3 on supported H100 hardware) for efficient training and inference
- Ready-to-use checkpoints on Hugging Face that integrate with common transformers pipelines
- Example scripts for training and evaluating dense retrieval (Sentence Transformers) and ColBERT (PyLate) models
- GLUE evaluation tooling via run_evals.py and glue.py for benchmarking models
Getting started
For most use cases, load a released ModernBERT checkpoint from Hugging Face with the transformers library. The research repository in this GitHub project is for pre-training and evaluation, and is installed separately via conda.
Install transformers
Install the Hugging Face transformers library along with PyTorch to load and run the model.
pip install -U transformers torchLoad the model and tokenizer
Load ModernBERT-base from the Hugging Face Hub and run a masked-language-modeling example, the task the base model is trained on.
from transformers import AutoTokenizer, AutoModelForMaskedLM
model_id = "answerdotai/ModernBERT-base"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id)
text = "The capital of France is [MASK]."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)Train your own retrieval model (optional)
To fine-tune ModernBERT into an embedding or retrieval model, clone this repository and use the example scripts under the examples folder, which cover Sentence Transformers and PyLate (ColBERT).
git clone https://github.com/AnswerDotAI/ModernBERT.gitCommands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Use ModernBERT as a backbone to fine-tune dense embedding models for semantic search and retrieval
- Train ColBERT-style late-interaction retrievers with the included PyLate example scripts
- Fine-tune the encoder for text classification or other GLUE-style downstream tasks
- Reproduce or extend the ModernBERT pre-training and evaluation experiments
How ModernBERT compares
ModernBERT alongside other open-source embedding models & inference tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Sentence Transformers | ★ 18.8k | The standard Python framework for loading, training, and computing embeddings with sentence and reranking models. |
| EmbeddingGemma (Gemma) | ★ 5.5k | Google DeepMind's Gemma repo, home to EmbeddingGemma, a 308M multilingual embedding model small enough to run on-device for RAG and semantic search. |
| Text Embeddings Inference (TEI) | ★ 4.9k | Hugging Face's Rust-based server for deploying embedding, reranking, and sequence-classification models with high throughput on GPU or CPU. |
| Infinity (Embeddings) | ★ 2.8k | A high-throughput serving engine for text embeddings, rerankers, CLIP, and ColPali models, exposing an OpenAI-compatible API. |
| ColPali | ★ 2.7k | A vision-language embedding model that indexes whole document page images for retrieval, avoiding the need to parse PDFs into text first. |
| Model2Vec | ★ 2.1k | A tool that distills any sentence transformer into a tiny, fast static embedding model (the Potion models) that runs on CPU without a neural network at inference. |
| Instructor Embedding | ★ 2k | Instruction-tuned text embedding models that let you tailor embeddings to a task by prepending a natural-language instruction. |
| ModernBERT | ★ 1.7k | A modernized BERT encoder with long context and faster inference |