AI/TLDR

ModernBERT

A modernized BERT encoder with long context and faster inference

Overview

ModernBERT is an updated take on the BERT encoder from Answer.AI, LightOn, and collaborators. It brings architecture changes and modern training to the encoder family, with support for long context and faster inference. This repository is the research codebase used for pre-training and GLUE evaluations; the ready-to-use models live in the ModernBERT collection on Hugging Face.

It is meant for developers and researchers who need a strong encoder backbone rather than a generative model. Encoders like ModernBERT turn text into dense vectors, which makes them a good fit for embeddings, retrieval, classification, and other tasks where you compare or rank text instead of generating it.

Within the embedding-models space, ModernBERT is most often used as a starting point you fine-tune. The repository ships example scripts for training dense retrieval models with Sentence Transformers and ColBERT models with PyLate, so teams can build their own embedding or retrieval systems on top of it.

What it does

  • Modernized BERT architecture introducing FlexBERT, a modular approach to encoder building blocks configured through YAML files
  • Long-context support and faster inference compared to the original BERT
  • Uses Flash Attention 2 (and Flash Attention 3 on supported H100 hardware) for efficient training and inference
  • Ready-to-use checkpoints on Hugging Face that integrate with common transformers pipelines
  • Example scripts for training and evaluating dense retrieval (Sentence Transformers) and ColBERT (PyLate) models
  • GLUE evaluation tooling via run_evals.py and glue.py for benchmarking models

Getting started

For most use cases, load a released ModernBERT checkpoint from Hugging Face with the transformers library. The research repository in this GitHub project is for pre-training and evaluation, and is installed separately via conda.

Install transformers

Install the Hugging Face transformers library along with PyTorch to load and run the model.

bashbash
pip install -U transformers torch

Load the model and tokenizer

Load ModernBERT-base from the Hugging Face Hub and run a masked-language-modeling example, the task the base model is trained on.

pythonpython
from transformers import AutoTokenizer, AutoModelForMaskedLM

model_id = "answerdotai/ModernBERT-base"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id)

text = "The capital of France is [MASK]."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

Train your own retrieval model (optional)

To fine-tune ModernBERT into an embedding or retrieval model, clone this repository and use the example scripts under the examples folder, which cover Sentence Transformers and PyLate (ColBERT).

bashbash
git clone https://github.com/AnswerDotAI/ModernBERT.git

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Use ModernBERT as a backbone to fine-tune dense embedding models for semantic search and retrieval
  • Train ColBERT-style late-interaction retrievers with the included PyLate example scripts
  • Fine-tune the encoder for text classification or other GLUE-style downstream tasks
  • Reproduce or extend the ModernBERT pre-training and evaluation experiments

How ModernBERT compares

ModernBERT alongside other open-source embedding models & inference tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Sentence Transformers★ 18.8kThe standard Python framework for loading, training, and computing embeddings with sentence and reranking models.
EmbeddingGemma (Gemma)★ 5.5kGoogle DeepMind's Gemma repo, home to EmbeddingGemma, a 308M multilingual embedding model small enough to run on-device for RAG and semantic search.
Text Embeddings Inference (TEI)★ 4.9kHugging Face's Rust-based server for deploying embedding, reranking, and sequence-classification models with high throughput on GPU or CPU.
Infinity (Embeddings)★ 2.8kA high-throughput serving engine for text embeddings, rerankers, CLIP, and ColPali models, exposing an OpenAI-compatible API.
ColPali★ 2.7kA vision-language embedding model that indexes whole document page images for retrieval, avoiding the need to parse PDFs into text first.
Model2Vec★ 2.1kA tool that distills any sentence transformer into a tiny, fast static embedding model (the Potion models) that runs on CPU without a neural network at inference.
Instructor Embedding★ 2kInstruction-tuned text embedding models that let you tailor embeddings to a task by prepending a natural-language instruction.
ModernBERT★ 1.7kA modernized BERT encoder with long context and faster inference