Mistral NeMo (12B)

Name: Mistral NeMo (12B)
Author: Mistral AI

A 12B open dense model with a 128K context, built with NVIDIA

Overview

Mistral NeMo is a 12-billion-parameter open dense language model that Mistral AI built jointly with NVIDIA and released on July 18, 2024. It ships under the permissive Apache 2.0 license, with both a base checkpoint (Mistral-Nemo-Base-2407) and an instruction-tuned checkpoint (Mistral-Nemo-Instruct-2407) on Hugging Face. The model offers a 128K-token context window and was positioned by Mistral as a state-of-the-art replacement for the older Mistral 7B in its size class.

A distinguishing feature of Mistral NeMo is its new tokenizer, called Tekken, which is based on Tiktoken and trained on over 100 languages. Mistral reports that Tekken compresses natural-language text and source code more efficiently than the SentencePiece tokenizer used in earlier Mistral models, with especially large gains for languages such as Korean and Arabic. The model is multilingual by design, with Mistral highlighting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.

Mistral NeMo was trained with quantization awareness so it can run FP8 inference without losing accuracy, and it was designed to fit on a single GPU, making it practical for local and self-hosted deployment. On Mistral's own platform the model was served as open-mistral-nemo-2407. Mistral has since marked that API endpoint as deprecated (deprecation date 5/22/2026, retirement 7/31/2026) and points API users to Ministral 3 8B as the recommended successor, but the open weights remain freely available under Apache 2.0 and are widely mirrored through tools like Ollama and LM Studio.

Released	2024-07-18
License	Apache 2.0
Weights	Open weights
Parameters	12B
Context	128K
Max output	Not specified by Mistral AI
Architecture	Dense transformer decoder. 40 layers, model dimension 5,120, 32 attention heads with 8 key-value heads (grouped-query attention), head dimension 128, hidden dimension 14,436, SwiGLU activation, and rotary position embeddings with theta = 1M. Vocabulary of roughly 131K tokens (2^17) via the new Tekken tokenizer. Trained with quantization awareness to support FP8 inference. Uses a standard architecture so it works as a drop-in replacement for Mistral 7B.
Knowledge cutoff	Not disclosed by Mistral AI
Modalities	Text
Status	Available (open weights); API deprecated

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.15 / 1M tokens per 1M tokens
Output	$0.15 / 1M tokens per 1M tokens

First-party Mistral la Plateforme pricing for open-mistral-nemo. The API endpoint is deprecated (retirement 7/31/2026); open weights are free to self-host under Apache 2.0. Third-party hosts may price differently.

Pricing source ↗

Strengths

Apache 2.0 open weights — base and instruct checkpoints are freely usable, modifiable, and self-hostable
Large 128K-token context window, trained natively rather than extended after the fact
Efficient Tekken tokenizer covering 100+ languages, with strong multilingual and code compression
Quantization-aware training enables FP8 inference with no reported accuracy loss
Compact 12B size fits on a single GPU, making it cheap to run locally
Drop-in replacement for Mistral 7B thanks to a standard transformer architecture

Best for

Self-hosted chat and assistant applications on a single GPU
Multilingual text generation and translation across 100+ languages
Long-document summarization and analysis using the 128K context
Function calling and tool use in lightweight agent workflows
On-premises or privacy-sensitive deployments where open weights are required
Code generation and completion for everyday programming tasks

How to access

Provider	Model ID
Mistral AI (la Plateforme) ↗	`open-mistral-nemo-2407`

Mistral 7B / Nemo (open dense) — every version

The full lineage of the Mistral 7B / Nemo (open dense) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Mistral NeMo (12B)current	2024-07-18	—	Apache-2.0
Mistral 7B	2023-09-27	—	Apache-2.0

FAQ

Is Mistral NeMo open source?

Yes. Mistral NeMo's base and instruction-tuned weights are released under the Apache 2.0 license, which permits commercial use, modification, and redistribution. The weights are published on Hugging Face as Mistral-Nemo-Base-2407 and Mistral-Nemo-Instruct-2407.

How many parameters and how large is the context window?

Mistral NeMo has 12 billion parameters and a 128K-token context window. It is a dense transformer with 40 layers and grouped-query attention (32 attention heads, 8 key-value heads).

Who built Mistral NeMo?

It was built jointly by Mistral AI and NVIDIA and released on July 18, 2024. The model was trained on NVIDIA's DGX Cloud platform and is also packaged as an NVIDIA NIM microservice.

Is Mistral NeMo still available?

The open weights remain freely available under Apache 2.0 and are widely used for self-hosting. On Mistral's own API the open-mistral-nemo-2407 endpoint has been deprecated (retirement 7/31/2026), with Ministral 3 8B recommended as the successor.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// Mistral 7B / Nemo (open dense) — every version

// FAQ