AI/TLDR

Aya Expanse (8B & 32B)

Cohere Labs' open-weight multilingual LLMs in 8B and 32B sizes across 23 languages — the 32B beats models more than twice its size on multilingual benchmarks.

Overview

Aya Expanse is Cohere Labs' open-weight multilingual model line, released on 24 October 2024 in two sizes — Aya Expanse 8B and Aya Expanse 32B. Both are auto-regressive transformer language models built specifically to close the gap between English-centric LLMs and the rest of the world's languages, with strong performance across 23 languages: Arabic, Chinese (simplified and traditional), Czech, Dutch, English, French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian and Vietnamese.

The standout result is efficiency relative to size. On m-ArenaHard — a multilingual version of Arena-Hard-Auto translated into all 23 languages and judged by GPT-4o — Aya Expanse 32B reaches a 54.0% win rate against Llama 3.1 70B, a model more than twice its size, and also outperforms Gemma 2 27B and Mixtral 8x22B. The smaller Aya Expanse 8B beats leading open-weight models in its class (Gemma 2 9B, Llama 3.1 8B and Ministral 8B) with win rates ranging from 60.4% to 70.6%. These gains come from combining data arbitrage, multilingual preference training, safety tuning and model merging, described in the accompanying paper.

Aya Expanse is text-in, text-out only. The open weights are published on Hugging Face and Kaggle under a CC-BY-NC (non-commercial research) license with Cohere's Acceptable Use Policy. Both models are also served through the Cohere Chat API as c4ai-aya-expanse-8b and c4ai-aya-expanse-32b; the 32B (128K context) remains live, while the 8B (8K context) was retired from the API on 4 April 2026. The line later expanded into multimodal territory with Aya Vision (March 2025), whose 32B variant uses Aya Expanse 32B as its language backbone.

Released2024-10-24
LicenseCC-BY-NC (with Cohere Lab's Acceptable Use Policy)
WeightsOpen weights
ParametersTwo variants — 8B and 32B parameters
Context128K (32B) · 8K (8B)
Max outputUndisclosed
ArchitectureAuto-regressive language model using an optimized transformer architecture. Released in two sizes (8B and 32B). Post-training combined supervised fine-tuning, multilingual preference training, safety tuning, and model merging, plus a data-arbitrage strategy for sourcing high-quality multilingual training data — the research breakthroughs detailed in the Aya Expanse paper.
Knowledge cutoffUndisclosed
ModalitiesText
StatusGenerally available (8B variant retired on the Cohere API on 2026-04-04; open weights remain available)

Benchmarks

  1. m-ArenaHard win rate — Aya Expanse 32B vs Llama 3.1 70B (GPT-4o judge)54% win rate
  2. m-ArenaHard win rate — Aya Expanse 8B vs Gemma 2 9B (GPT-4o judge)60.4% win rate
  3. m-ArenaHard win rate — Aya Expanse 8B vs Llama 3.1 8B (GPT-4o judge)70.6% win rate

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.50 / 1M tokens
Output$1.50 / 1M tokens

Pricing for both Aya Expanse variants on the Cohere API. The 8B variant was retired from the API on 4 April 2026; the 32B remains available.

Pricing source ↗

Strengths

  • Best-in-class multilingual quality for its size — the 32B beats Llama 3.1 70B (54.0% m-ArenaHard win rate), a model more than twice as large
  • Broad language coverage: strong performance across 23 languages, including lower-resource ones
  • Aya Expanse 8B outperforms Gemma 2 9B, Llama 3.1 8B and Ministral 8B in its parameter class (60.4%-70.6% win rates)
  • Open weights on Hugging Face and Kaggle, so the models can be self-hosted for research
  • 32B variant offers a long 128K-token context window via the Cohere API
  • Built on published research — data arbitrage, multilingual preference training, safety tuning and model merging

Best for

  • Multilingual chat and assistants spanning all 23 supported languages
  • Cross-lingual question answering, summarization and rewriting
  • Machine translation and language-bridging tasks
  • Research on multilingual alignment, preference training and model merging
  • Self-hosted deployments where open weights are required (non-commercial use)
  • Serving lower-resource languages where English-centric models underperform

How to access

ProviderModel ID
Cohere ↗c4ai-aya-expanse-32b
Hugging Face (download weights) ↗CohereLabs/aya-expanse-32b
Hugging Face (download weights) ↗CohereLabs/aya-expanse-8b

FAQ

What is Aya Expanse?

Aya Expanse is Cohere Labs' open-weight multilingual language model line, released on 24 October 2024 in two sizes — 8B and 32B parameters. Both are auto-regressive transformer models optimized for strong performance across 23 languages, built to close the gap between English-centric LLMs and the rest of the world's languages.

How does Aya Expanse 32B compare to bigger models?

On m-ArenaHard — a multilingual benchmark translated into all 23 supported languages and judged by GPT-4o — Aya Expanse 32B reaches a 54.0% win rate against Llama 3.1 70B, a model more than twice its size, and also outperforms Gemma 2 27B and Mixtral 8x22B. The smaller 8B model beats Gemma 2 9B, Llama 3.1 8B and Ministral 8B in its class with win rates of 60.4% to 70.6%.

Is Aya Expanse open source and free to use commercially?

The weights for both the 8B and 32B models are published on Hugging Face and Kaggle, but under a CC-BY-NC license (with Cohere Lab's Acceptable Use Policy), which permits non-commercial research use only — not commercial deployment. You can also call the models through the Cohere Chat API at $0.50/1M input tokens and $1.50/1M output tokens.

How do I access Aya Expanse through an API?

Both variants are served through the Cohere Chat API as c4ai-aya-expanse-8b and c4ai-aya-expanse-32b. The 32B (128K context) remains live; the 8B (8K context) was retired from the Cohere API on 4 April 2026. The open weights remain downloadable from Hugging Face and Kaggle, and the 32B can also be run locally via Ollama.