ModernBERT

A modernized BERT encoder with long context and faster inference

github.com/AnswerDotAI/ModernBERT★ 1.7k huggingface.co/answerdotai/ModernBERT-base

Overview

ModernBERT is an updated take on the BERT encoder from Answer.AI, LightOn, and collaborators. It brings architecture changes and modern training to the encoder family, with support for long context and faster inference. This repository is the research codebase used for pre-training and GLUE evaluations; the ready-to-use models live in the ModernBERT collection on Hugging Face.

It is meant for developers and researchers who need a strong encoder backbone rather than a generative model. Encoders like ModernBERT turn text into dense vectors, which makes them a good fit for embeddings, retrieval, classification, and other tasks where you compare or rank text instead of generating it.

Within the embedding-models space, ModernBERT is most often used as a starting point you fine-tune. The repository ships example scripts for training dense retrieval models with Sentence Transformers and ColBERT models with PyLate, so teams can build their own embedding or retrieval systems on top of it.

What it does

Modernized BERT architecture introducing FlexBERT, a modular approach to encoder building blocks configured through YAML files
Long-context support and faster inference compared to the original BERT
Uses Flash Attention 2 (and Flash Attention 3 on supported H100 hardware) for efficient training and inference
Ready-to-use checkpoints on Hugging Face that integrate with common transformers pipelines
Example scripts for training and evaluating dense retrieval (Sentence Transformers) and ColBERT (PyLate) models
GLUE evaluation tooling via run_evals.py and glue.py for benchmarking models

Getting started

For most use cases, load a released ModernBERT checkpoint from Hugging Face with the transformers library. The research repository in this GitHub project is for pre-training and evaluation, and is installed separately via conda.

Install transformers

Install the Hugging Face transformers library along with PyTorch to load and run the model.

bashbash

pip install -U transformers torch

Load the model and tokenizer

Load ModernBERT-base from the Hugging Face Hub and run a masked-language-modeling example, the task the base model is trained on.

pythonpython

from transformers import AutoTokenizer, AutoModelForMaskedLM

model_id = "answerdotai/ModernBERT-base"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id)

text = "The capital of France is [MASK]."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

Train your own retrieval model (optional)

To fine-tune ModernBERT into an embedding or retrieval model, clone this repository and use the example scripts under the examples folder, which cover Sentence Transformers and PyLate (ColBERT).

bashbash

git clone https://github.com/AnswerDotAI/ModernBERT.git

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Use ModernBERT as a backbone to fine-tune dense embedding models for semantic search and retrieval
Train ColBERT-style late-interaction retrievers with the included PyLate example scripts
Fine-tune the encoder for text classification or other GLUE-style downstream tasks
Reproduce or extend the ModernBERT pre-training and evaluation experiments

How ModernBERT compares

ModernBERT alongside other open-source embedding models & inference tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Sentence Transformers	★ 18.8k	The standard Python framework for loading, training, and computing embeddings with sentence and reranking models.
EmbeddingGemma (Gemma)	★ 5.5k	Google DeepMind's Gemma repo, home to EmbeddingGemma, a 308M multilingual embedding model small enough to run on-device for RAG and semantic search.
Text Embeddings Inference (TEI)	★ 4.9k	Hugging Face's Rust-based server for deploying embedding, reranking, and sequence-classification models with high throughput on GPU or CPU.
Infinity (Embeddings)	★ 2.8k	A high-throughput serving engine for text embeddings, rerankers, CLIP, and ColPali models, exposing an OpenAI-compatible API.
ColPali	★ 2.7k	A vision-language embedding model that indexes whole document page images for retrieval, avoiding the need to parse PDFs into text first.
Model2Vec	★ 2.1k	A tool that distills any sentence transformer into a tiny, fast static embedding model (the Potion models) that runs on CPU without a neural network at inference.
Instructor Embedding	★ 2k	Instruction-tuned text embedding models that let you tailor embeddings to a task by prepending a natural-language instruction.
ModernBERT	★ 1.7k	A modernized BERT encoder with long context and faster inference

// Overview

// What it does

// Getting started