AI/TLDR

Command A Reasoning (08-2025)

Cohere's first reasoning model, built for secure enterprise agents and RAG.

Overview

Command A Reasoning (08-2025) is Cohere's first reasoning model and the reasoning flagship of its Command A line, released on August 21, 2025. It is a 111-billion-parameter dense transformer with a 256K-token context window (128K when served on a single GPU) and up to 32K tokens of output. Unlike a fixed reasoning model, Command A Reasoning lets you switch its 'thinking' mode on or off and set an explicit token budget, so a team can dial up deliberate multi-step reasoning for hard problems or turn it off for fast, low-latency replies on the same model.

Cohere positions Command A Reasoning squarely at the enterprise: it is tuned for tool use, agents, retrieval-augmented generation (RAG) with grounded citations, and multilingual customer-service and research workflows across 23 languages, including English, French, Spanish, German, Japanese, Korean, Arabic, Chinese and Hindi. The model is text-only (no image, audio or video input). Cohere reports that it outperforms comparable models such as gpt-oss-120b, DeepSeek-R1 (0528) and Mistral Magistral Medium on agentic and research benchmarks like BFCL-v3, Tau-bench and DeepResearch Bench.

The model is open-weights: the 111B checkpoint is published on Hugging Face under a CC-BY-NC 4.0 license (non-commercial research use; commercial deployment requires a license from Cohere). It is also available through Cohere's API as command-a-reasoning-08-2025 and on cloud platforms such as Oracle OCI Generative AI. Because it is sized to run on roughly one to two H100 or A100 GPUs, it is unusually practical to self-host for a frontier-class reasoning model, which is a core part of Cohere's privacy-first, on-prem enterprise pitch.

Released2025-08-21
LicenseCC-BY-NC 4.0 (non-commercial; commercial use requires a Cohere license)
WeightsOpen weights
Parameters111B
Context256K
Max output32K tokens
ArchitectureAuto-regressive dense transformer. Attention is hybrid: three layers use sliding-window attention (4,096-token window) with RoPE for local context, and every fourth layer uses global attention across the full sequence to handle the 256K context efficiently. Reasoning ("thinking") can be turned on or off, and when on the model exposes a configurable token budget that lets you trade latency and cost against answer quality. The 111B weights are sized to run on roughly 1-2 H100 or A100 GPUs.
Knowledge cutoffJune 1, 2024
ModalitiesText
StatusAvailable

Benchmarks

  1. BFCL-v3, Tau-bench, DeepResearch Bench (agentic/research suite)0qualitative

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

InputNot publicly listed / 1M tokens
OutputNot publicly listed / 1M tokens

Command A Reasoning has no published per-token price. On Cohere's platform it is free on trial keys until rate limits; production access requires contacting Cohere sales. (The separately-priced non-reasoning Command A model lists at $2.50 / $10.00 per 1M input/output tokens, but that rate is not confirmed for Command A Reasoning.)

Pricing source ↗

Strengths

  • Toggleable reasoning: turn 'thinking' on for hard, multi-step problems or off for fast low-latency answers, all on one model
  • Token-budget control lets you cap reasoning spend per request to manage cost and latency
  • Large 256K-token context for long documents, transcripts and multi-turn agent histories
  • Open weights on Hugging Face (CC-BY-NC 4.0) — inspectable and self-hostable for research
  • Efficient footprint: a 111B model that runs on roughly 1-2 H100/A100 GPUs
  • Strong enterprise tooling: native tool use, agents and RAG with grounded citations
  • Broad multilingual coverage across 23 languages

Best for

  • Enterprise customer-service and support agents in 23 languages
  • Retrieval-augmented generation (RAG) over private documents with grounded citations
  • Autonomous and ReAct-style agents that call tools, search and databases
  • Deep-research assistants that produce long, well-sourced reports
  • Long-document review and analysis using the 256K context window
  • Privacy-sensitive deployments that need on-prem or single-cloud self-hosting
  • Latency-vs-quality tuning, dialing reasoning budget up or down per workload

How to access

ProviderModel ID
Cohere ↗command-a-reasoning-08-2025
Oracle OCI Generative AI ↗cohere.command-a-reasoning-08-2025

FAQ

What is Command A Reasoning (08-2025)?

It is Cohere's first reasoning model and the reasoning flagship of its Command A line, released August 21, 2025. It is a 111-billion-parameter, text-only LLM with a 256K-token context window, built for enterprise agents, tool use, RAG and multilingual workflows across 23 languages.

Is Command A Reasoning open-weights, and what is the license?

Yes. The 111B weights are published on Hugging Face under a CC-BY-NC 4.0 license, which allows non-commercial research use. Commercial deployment requires a license from Cohere.

What makes Command A Reasoning different from a standard reasoning model?

Its reasoning ('thinking') mode can be turned on or off on the same model, and when on you can set an explicit token budget. That lets you trade latency and cost against answer quality per request, rather than always paying for full reasoning.

How much does Command A Reasoning cost?

Cohere does not publish a per-token price for Command A Reasoning. It is free on trial keys until rate limits, and production access requires contacting Cohere sales. The separate non-reasoning Command A model is priced at $2.50/$10.00 per 1M input/output tokens, but that rate is not confirmed for the reasoning variant.