Overview
Command A Reasoning (08-2025) is Cohere's first reasoning model and the reasoning flagship of its Command A line, released on August 21, 2025. It is a 111-billion-parameter dense transformer with a 256K-token context window (128K when served on a single GPU) and up to 32K tokens of output. Unlike a fixed reasoning model, Command A Reasoning lets you switch its 'thinking' mode on or off and set an explicit token budget, so a team can dial up deliberate multi-step reasoning for hard problems or turn it off for fast, low-latency replies on the same model.
Cohere positions Command A Reasoning squarely at the enterprise: it is tuned for tool use, agents, retrieval-augmented generation (RAG) with grounded citations, and multilingual customer-service and research workflows across 23 languages, including English, French, Spanish, German, Japanese, Korean, Arabic, Chinese and Hindi. The model is text-only (no image, audio or video input). Cohere reports that it outperforms comparable models such as gpt-oss-120b, DeepSeek-R1 (0528) and Mistral Magistral Medium on agentic and research benchmarks like BFCL-v3, Tau-bench and DeepResearch Bench.
The model is open-weights: the 111B checkpoint is published on Hugging Face under a CC-BY-NC 4.0 license (non-commercial research use; commercial deployment requires a license from Cohere). It is also available through Cohere's API as command-a-reasoning-08-2025 and on cloud platforms such as Oracle OCI Generative AI. Because it is sized to run on roughly one to two H100 or A100 GPUs, it is unusually practical to self-host for a frontier-class reasoning model, which is a core part of Cohere's privacy-first, on-prem enterprise pitch.
| Released | 2025-08-21 |
|---|---|
| License | CC-BY-NC 4.0 (non-commercial; commercial use requires a Cohere license) |
| Weights | Open weights |
| Parameters | 111B |
| Context | 256K |
| Max output | 32K tokens |
| Architecture | Auto-regressive dense transformer. Attention is hybrid: three layers use sliding-window attention (4,096-token window) with RoPE for local context, and every fourth layer uses global attention across the full sequence to handle the 256K context efficiently. Reasoning ("thinking") can be turned on or off, and when on the model exposes a configurable token budget that lets you trade latency and cost against answer quality. The 111B weights are sized to run on roughly 1-2 H100 or A100 GPUs. |
| Knowledge cutoff | June 1, 2024 |
| Modalities | Text |
| Status | Available |
Benchmarks
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | Not publicly listed / 1M tokens |
|---|---|
| Output | Not publicly listed / 1M tokens |
Command A Reasoning has no published per-token price. On Cohere's platform it is free on trial keys until rate limits; production access requires contacting Cohere sales. (The separately-priced non-reasoning Command A model lists at $2.50 / $10.00 per 1M input/output tokens, but that rate is not confirmed for Command A Reasoning.)
Strengths
- Toggleable reasoning: turn 'thinking' on for hard, multi-step problems or off for fast low-latency answers, all on one model
- Token-budget control lets you cap reasoning spend per request to manage cost and latency
- Large 256K-token context for long documents, transcripts and multi-turn agent histories
- Open weights on Hugging Face (CC-BY-NC 4.0) — inspectable and self-hostable for research
- Efficient footprint: a 111B model that runs on roughly 1-2 H100/A100 GPUs
- Strong enterprise tooling: native tool use, agents and RAG with grounded citations
- Broad multilingual coverage across 23 languages
Best for
- Enterprise customer-service and support agents in 23 languages
- Retrieval-augmented generation (RAG) over private documents with grounded citations
- Autonomous and ReAct-style agents that call tools, search and databases
- Deep-research assistants that produce long, well-sourced reports
- Long-document review and analysis using the 256K context window
- Privacy-sensitive deployments that need on-prem or single-cloud self-hosting
- Latency-vs-quality tuning, dialing reasoning budget up or down per workload
How to access
| Provider | Model ID |
|---|---|
| Cohere ↗ | command-a-reasoning-08-2025 |
| Oracle OCI Generative AI ↗ | cohere.command-a-reasoning-08-2025 |
FAQ
What is Command A Reasoning (08-2025)?
It is Cohere's first reasoning model and the reasoning flagship of its Command A line, released August 21, 2025. It is a 111-billion-parameter, text-only LLM with a 256K-token context window, built for enterprise agents, tool use, RAG and multilingual workflows across 23 languages.
Is Command A Reasoning open-weights, and what is the license?
Yes. The 111B weights are published on Hugging Face under a CC-BY-NC 4.0 license, which allows non-commercial research use. Commercial deployment requires a license from Cohere.
What makes Command A Reasoning different from a standard reasoning model?
Its reasoning ('thinking') mode can be turned on or off on the same model, and when on you can set an explicit token budget. That lets you trade latency and cost against answer quality per request, rather than always paying for full reasoning.
How much does Command A Reasoning cost?
Cohere does not publish a per-token price for Command A Reasoning. It is free on trial keys until rate limits, and production access requires contacting Cohere sales. The separate non-reasoning Command A model is priced at $2.50/$10.00 per 1M input/output tokens, but that rate is not confirmed for the reasoning variant.
