Command A Reasoning (08-2025)

Name: Command A Reasoning (08-2025)
Author: Cohere

Cohere's first reasoning model, built for secure enterprise agents and RAG.

Overview

Command A Reasoning (08-2025) is Cohere's first reasoning model and the reasoning flagship of its Command A line, released on August 21, 2025. It is a 111-billion-parameter dense transformer with a 256K-token context window (128K when served on a single GPU) and up to 32K tokens of output. Unlike a fixed reasoning model, Command A Reasoning lets you switch its 'thinking' mode on or off and set an explicit token budget, so a team can dial up deliberate multi-step reasoning for hard problems or turn it off for fast, low-latency replies on the same model.

Cohere positions Command A Reasoning squarely at the enterprise: it is tuned for tool use, agents, retrieval-augmented generation (RAG) with grounded citations, and multilingual customer-service and research workflows across 23 languages, including English, French, Spanish, German, Japanese, Korean, Arabic, Chinese and Hindi. The model is text-only (no image, audio or video input). Cohere reports that it outperforms comparable models such as gpt-oss-120b, DeepSeek-R1 (0528) and Mistral Magistral Medium on agentic and research benchmarks like BFCL-v3, Tau-bench and DeepResearch Bench.

The model is open-weights: the 111B checkpoint is published on Hugging Face under a CC-BY-NC 4.0 license (non-commercial research use; commercial deployment requires a license from Cohere). It is also available through Cohere's API as command-a-reasoning-08-2025 and on cloud platforms such as Oracle OCI Generative AI. Because it is sized to run on roughly one to two H100 or A100 GPUs, it is unusually practical to self-host for a frontier-class reasoning model, which is a core part of Cohere's privacy-first, on-prem enterprise pitch.

Released	2025-08-21
License	CC-BY-NC 4.0 (non-commercial; commercial use requires a Cohere license)
Weights	Open weights
Parameters	111B
Context	256K
Max output	32K tokens
Architecture	Auto-regressive dense transformer. Attention is hybrid: three layers use sliding-window attention (4,096-token window) with RoPE for local context, and every fourth layer uses global attention across the full sequence to handle the 256K context efficiently. Reasoning ("thinking") can be turned on or off, and when on the model exposes a configurable token budget that lets you trade latency and cost against answer quality. The 111B weights are sized to run on roughly 1-2 H100 or A100 GPUs.
Knowledge cutoff	June 1, 2024
Modalities	Text
Status	Available

Benchmarks

BFCL-v3, Tau-bench, DeepResearch Bench (agentic/research suite)0qualitative

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	Not publicly listed / 1M tokens
Output	Not publicly listed / 1M tokens

Command A Reasoning has no published per-token price. On Cohere's platform it is free on trial keys until rate limits; production access requires contacting Cohere sales. (The separately-priced non-reasoning Command A model lists at $2.50 / $10.00 per 1M input/output tokens, but that rate is not confirmed for Command A Reasoning.)

Pricing source ↗

Strengths

Toggleable reasoning: turn 'thinking' on for hard, multi-step problems or off for fast low-latency answers, all on one model
Token-budget control lets you cap reasoning spend per request to manage cost and latency
Large 256K-token context for long documents, transcripts and multi-turn agent histories
Open weights on Hugging Face (CC-BY-NC 4.0) — inspectable and self-hostable for research
Efficient footprint: a 111B model that runs on roughly 1-2 H100/A100 GPUs
Strong enterprise tooling: native tool use, agents and RAG with grounded citations
Broad multilingual coverage across 23 languages

Best for

Enterprise customer-service and support agents in 23 languages
Retrieval-augmented generation (RAG) over private documents with grounded citations
Autonomous and ReAct-style agents that call tools, search and databases
Deep-research assistants that produce long, well-sourced reports
Long-document review and analysis using the 256K context window
Privacy-sensitive deployments that need on-prem or single-cloud self-hosting
Latency-vs-quality tuning, dialing reasoning budget up or down per workload

How to access

Provider	Model ID
Cohere ↗	`command-a-reasoning-08-2025`
Oracle OCI Generative AI ↗	`cohere.command-a-reasoning-08-2025`

FAQ

What is Command A Reasoning (08-2025)?

It is Cohere's first reasoning model and the reasoning flagship of its Command A line, released August 21, 2025. It is a 111-billion-parameter, text-only LLM with a 256K-token context window, built for enterprise agents, tool use, RAG and multilingual workflows across 23 languages.

Is Command A Reasoning open-weights, and what is the license?

Yes. The 111B weights are published on Hugging Face under a CC-BY-NC 4.0 license, which allows non-commercial research use. Commercial deployment requires a license from Cohere.

What makes Command A Reasoning different from a standard reasoning model?

Its reasoning ('thinking') mode can be turned on or off on the same model, and when on you can set an explicit token budget. That lets you trade latency and cost against answer quality per request, rather than always paying for full reasoning.

How much does Command A Reasoning cost?

Cohere does not publish a per-token price for Command A Reasoning. It is free on trial keys until rate limits, and production access requires contacting Cohere sales. The separate non-reasoning Command A model is priced at $2.50/$10.00 per 1M input/output tokens, but that rate is not confirmed for the reasoning variant.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// FAQ