AI/TLDR

Command R7B (12-2024)

Cohere's smallest, fastest R-series model — a 7B open-weights LLM with 128K context, tuned for RAG, tool use, and agents on commodity hardware.

Overview

Command R7B (12-2024) is the smallest, fastest, and final model in Cohere's R series of enterprise large language models, announced on December 13, 2024. At 7 billion parameters with a 128K-token context window, it is designed for high-throughput, latency-sensitive deployments and is compact enough to run on commodity GPUs, edge devices, and even CPUs and MacBooks.

The model is purpose-built for the workloads Cohere targets in production: retrieval-augmented generation (RAG), tool use, and multistep ReAct-style agents that require complex reasoning and active information seeking. It also handles summarization, question answering, and code, and supports 23 languages including English, French, Spanish, German, Japanese, Korean, Arabic, Chinese, and Hindi.

Command R7B ships as open weights on Hugging Face under a non-commercial CC-BY-NC license (the CohereLabs/c4ai-command-r7b-12-2024 release) for research, and is available for commercial use through the Cohere API as model `command-r7b-12-2024`. On the Hugging Face Open LLM Leaderboard it posts an average of 31.4, leading with strong IFEval instruction-following while keeping per-token costs among the lowest of any production model.

Released2024-12-13
LicenseCC-BY-NC 4.0 (open weights, non-commercial; commercial use via Cohere API)
WeightsOpen weights
Parameters7B
Context128K
Max output4K
ArchitectureAuto-regressive optimized transformer. Three sliding-window-attention layers (4096-token window) with RoPE positional encoding, plus a fourth layer with global attention across the full sequence; no positional embeddings on the global-attention layer.
ModalitiesText
StatusAvailable

Benchmarks

  1. Open LLM Leaderboard (Average)31.4%
  2. IFEval77.9%
  3. BBH (Big-Bench Hard)36.1%
  4. MATH (hard)26.4%
  5. MMLU-Pro28.5%
  6. MuSR11.6%
  7. GPQA7.7%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.0375 / 1M tokens per 1M tokens
Output$0.15 / 1M tokens per 1M tokens

Pricing source ↗

Strengths

  • Smallest and fastest model in Cohere's R series — runs on commodity GPUs, edge devices, and CPUs
  • Strong instruction following (77.9 on IFEval) and competitive RAG/tool-use performance for its size
  • 128K-token context window despite a 7B footprint
  • Multilingual across 23 languages
  • Built-in support for RAG with grounded citations, tool/function calling, and multistep ReAct agents
  • Very low API pricing ($0.0375 input / $0.15 output per 1M tokens)
  • Open weights on Hugging Face for research and self-hosting

Best for

  • High-volume, latency-sensitive RAG over long documents and knowledge bases
  • Tool-using and ReAct-style agents in dynamic, real-world environments
  • On-device and edge inference where a compact model is required
  • Multilingual question answering and summarization across 23 languages
  • Cost-sensitive classification, extraction, and simple Q&A at scale
  • Financial and numerical information extraction in conversational settings

How to access

ProviderModel ID
Cohere ↗command-r7b-12-2024
OpenRouter ↗cohere/command-r7b-12-2024

FAQ

How many parameters does Command R7B have?

Command R7B (12-2024) is a 7-billion-parameter model — the smallest in Cohere's Command R series. Its compact size lets it run on commodity GPUs, edge devices, and even CPUs and MacBooks.

What is Command R7B's context window?

Command R7B supports a 128K-token context window and a maximum of 4K output tokens, making it suitable for long-document RAG despite its small size.

Is Command R7B open source?

The weights are openly released on Hugging Face (CohereLabs/c4ai-command-r7b-12-2024) under a non-commercial CC-BY-NC 4.0 license for research. Commercial use is available through the Cohere API as model `command-r7b-12-2024`.

How much does Command R7B cost to use?

Via API it is priced at roughly $0.0375 per 1M input tokens and $0.15 per 1M output tokens — among the lowest rates of any production LLM, which suits high-volume RAG, classification, and agent workloads.