LightOn AI · 2026-04-21 · notable

DenseOn & LateOn — New BEIR Records at 149M Parameters, Fully Open

Item: DenseOn & LateOn — New BEIR Records at 149M Parameters, Fully Open
Rating: 3
Author: AI/TLDR

LightOn releases two 149M-parameter retrieval models: LateOn (ColBERT, 57.22 BEIR) sets a new record for any sub-150M model and beats all previous ColBERT models; DenseOn (dense, 56.20 BEIR) is the first sub-150M model to clear 56. Both Apache 2.0, full training data open.

LightOn DenseOn and LateOn — BEIR benchmark comparison showing new records at 149M parameters

Two 149M-parameter retrieval models from LightOn AI set new BEIR records — the smallest ColBERT and dense models to reach these scores.

Key specs

License	Apache 2.0
Parameters (each)	149M
Late on beir ndcg@10	57.22
Dense on beir ndcg@10	56.20
Late on decontaminated beir	60.36
Dense on decontaminated beir	57.71

What is it?

LightOn AI released two open-source text retrieval models built on ModernBERT at 149M parameters each. LateOn is a multi-vector (ColBERT-style) retrieval model scoring 57.22 BEIR NDCG@10 — the first ColBERT model and the first sub-150M model of any type to break 57. DenseOn is a single-vector (dense) retrieval model scoring 56.20 — the first sub-150M model to clear 56, outperforming models up to 4× larger. Both are Apache 2.0 with full pre-training data, fine-tuning data, and decontaminated benchmark sets published.

How does it work?

Both models use ModernBERT-base as the backbone. LateOn is trained via the PyLate ColBERT framework, producing per-token embeddings (128-dimensional) scored with MaxSim late-interaction — capturing fine-grained query-document term overlap without an extra reranking pass. DenseOn produces a single 768-dimensional sentence embedding per document, scored via cosine similarity. Pre-training used 665M curated query-document pairs; fine-tuning used 1.69M contrastive pairs with hard negatives. Decontamination analysis shows both models improve on held-out test sets after removing training-set overlap, confirming genuine generalization.

Why does it matter?

For RAG pipelines, retrieval model size directly affects latency, memory, and per-query cost. 149M-parameter models at FP32 fit in roughly 600 MB — deployable alongside a generation model on a single GPU. Getting ColBERT-quality retrieval at 149M previously required 600M+ parameter models. The full data transparency (training data, decontaminated evals) is unusual and makes these results more reproducible than most retrieval model releases.

Who is it for?

Teams building RAG or semantic search pipelines who need high-recall retrieval without a large model footprint.

Try it

pip install sentence-transformers pylate  # then: SentenceTransformer('lightonai/DenseOn')