AI/TLDR

Mistral NeMo (12B)

A 12B open dense model with a 128K context, built with NVIDIA

Overview

Mistral NeMo is a 12-billion-parameter open dense language model that Mistral AI built jointly with NVIDIA and released on July 18, 2024. It ships under the permissive Apache 2.0 license, with both a base checkpoint (Mistral-Nemo-Base-2407) and an instruction-tuned checkpoint (Mistral-Nemo-Instruct-2407) on Hugging Face. The model offers a 128K-token context window and was positioned by Mistral as a state-of-the-art replacement for the older Mistral 7B in its size class.

A distinguishing feature of Mistral NeMo is its new tokenizer, called Tekken, which is based on Tiktoken and trained on over 100 languages. Mistral reports that Tekken compresses natural-language text and source code more efficiently than the SentencePiece tokenizer used in earlier Mistral models, with especially large gains for languages such as Korean and Arabic. The model is multilingual by design, with Mistral highlighting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.

Mistral NeMo was trained with quantization awareness so it can run FP8 inference without losing accuracy, and it was designed to fit on a single GPU, making it practical for local and self-hosted deployment. On Mistral's own platform the model was served as open-mistral-nemo-2407. Mistral has since marked that API endpoint as deprecated (deprecation date 5/22/2026, retirement 7/31/2026) and points API users to Ministral 3 8B as the recommended successor, but the open weights remain freely available under Apache 2.0 and are widely mirrored through tools like Ollama and LM Studio.

Released2024-07-18
LicenseApache 2.0
WeightsOpen weights
Parameters12B
Context128K
Max outputNot specified by Mistral AI
ArchitectureDense transformer decoder. 40 layers, model dimension 5,120, 32 attention heads with 8 key-value heads (grouped-query attention), head dimension 128, hidden dimension 14,436, SwiGLU activation, and rotary position embeddings with theta = 1M. Vocabulary of roughly 131K tokens (2^17) via the new Tekken tokenizer. Trained with quantization awareness to support FP8 inference. Uses a standard architecture so it works as a drop-in replacement for Mistral 7B.
Knowledge cutoffNot disclosed by Mistral AI
ModalitiesText
StatusAvailable (open weights); API deprecated

Benchmarks

  1. MMLU (5-shot)68%
  2. HellaSwag (0-shot)83.5%
  3. Winogrande (0-shot)76.8%
  4. OpenBookQA (0-shot)60.6%
  5. CommonSenseQA (0-shot)70.4%
  6. TruthfulQA (0-shot)50.3%
  7. TriviaQA (5-shot)73.8%
  8. NaturalQuestions (5-shot)31.2%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.15 / 1M tokens per 1M tokens
Output$0.15 / 1M tokens per 1M tokens

First-party Mistral la Plateforme pricing for open-mistral-nemo. The API endpoint is deprecated (retirement 7/31/2026); open weights are free to self-host under Apache 2.0. Third-party hosts may price differently.

Pricing source ↗

Strengths

  • Apache 2.0 open weights — base and instruct checkpoints are freely usable, modifiable, and self-hostable
  • Large 128K-token context window, trained natively rather than extended after the fact
  • Efficient Tekken tokenizer covering 100+ languages, with strong multilingual and code compression
  • Quantization-aware training enables FP8 inference with no reported accuracy loss
  • Compact 12B size fits on a single GPU, making it cheap to run locally
  • Drop-in replacement for Mistral 7B thanks to a standard transformer architecture

Best for

  • Self-hosted chat and assistant applications on a single GPU
  • Multilingual text generation and translation across 100+ languages
  • Long-document summarization and analysis using the 128K context
  • Function calling and tool use in lightweight agent workflows
  • On-premises or privacy-sensitive deployments where open weights are required
  • Code generation and completion for everyday programming tasks

How to access

ProviderModel ID
Mistral AI (la Plateforme) ↗open-mistral-nemo-2407

Mistral 7B / Nemo (open dense) — every version

The full lineage of the Mistral 7B / Nemo (open dense) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Mistral NeMo (12B)current2024-07-18Apache-2.0
Mistral 7B2023-09-27Apache-2.0

FAQ

Is Mistral NeMo open source?

Yes. Mistral NeMo's base and instruction-tuned weights are released under the Apache 2.0 license, which permits commercial use, modification, and redistribution. The weights are published on Hugging Face as Mistral-Nemo-Base-2407 and Mistral-Nemo-Instruct-2407.

How many parameters and how large is the context window?

Mistral NeMo has 12 billion parameters and a 128K-token context window. It is a dense transformer with 40 layers and grouped-query attention (32 attention heads, 8 key-value heads).

Who built Mistral NeMo?

It was built jointly by Mistral AI and NVIDIA and released on July 18, 2024. The model was trained on NVIDIA's DGX Cloud platform and is also packaged as an NVIDIA NIM microservice.

Is Mistral NeMo still available?

The open weights remain freely available under Apache 2.0 and are widely used for self-hosting. On Mistral's own API the open-mistral-nemo-2407 endpoint has been deprecated (retirement 7/31/2026), with Ministral 3 8B recommended as the successor.