Overview
Grok 1 is the first model from xAI, Elon Musk's AI company, announced on November 3, 2023. xAI presented it as a very early beta — in their own words, "the best we could do with 2 months of training" — and seeded it to a small set of US users before rolling it into the X (formerly Twitter) Premium+ subscription tier. Grok 1 was the engine behind the original Grok chatbot, pitched as a witty assistant with real-time access to information on X.
Under the hood, Grok 1 is a large Mixture-of-Experts (MoE) language model with 314 billion total parameters. For any given token it routes through 2 of its 8 expert networks, so only about a quarter of the weights are active at inference time. It has a 64-layer Transformer architecture, an 8,192-token context window, and a pre-training data cutoff of October 2023. At launch, xAI reported that Grok 1 outperformed GPT-3.5 and Llama 2 70B on standard benchmarks while trailing larger frontier models like GPT-4.
On March 17, 2024, xAI open-sourced Grok 1, releasing the network architecture and full weight parameters under the permissive Apache 2.0 license on GitHub and Hugging Face. At 314B parameters it was, at the time, the largest open-weights MoE model publicly available. The released checkpoint is the unrefined base model from pre-training, not the chat-tuned version that ran in the product, so running it requires substantial GPU memory and additional fine-tuning to be useful as an assistant.
| Released | 2023-11-03 |
|---|---|
| License | Apache-2.0 |
| Weights | Open weights |
| Parameters | 314B total (Mixture-of-Experts; 2 of 8 experts active per token, ~25% of weights) |
| Context | 8,192 tokens |
| Max output | Not separately specified by xAI (shares the 8,192-token sequence length) |
| Architecture | Decoder-only Transformer with a Mixture-of-Experts (MoE) feed-forward layer. 314B total parameters across 8 experts, with 2 experts (roughly 25% of weights) activated per token. 64 layers; 48 attention heads for queries and 8 for keys/values; 6,144-dimensional embeddings; rotary position embeddings (RoPE); SentencePiece tokenizer with a 131,072-token vocabulary. The open-weights release is the raw pre-training base checkpoint — it was not fine-tuned for chat, dialogue, or any specific application and had no RLHF/safety tuning applied. |
| Knowledge cutoff | October 2023 |
| Modalities | Text |
| Status | Superseded — succeeded by Grok 1.5 (May 2024) and later Grok versions. The base weights remain available as an open-weights release under Apache 2.0; the hosted chatbot has long since moved to newer models. |
Benchmarks
- MMLU (5-shot)73%
- HumanEval (coding, pass@1)63.2%
- Hungarian national high school math exam (hand-graded)59%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Strengths
- Open weights under a fully permissive Apache 2.0 license — free to use, modify, and self-host
- Very large 314B-parameter Mixture-of-Experts design, the largest open MoE model when released
- Sparse activation (only ~25% of weights per token) keeps inference cheaper than a dense 314B model would be
- Strong-for-its-era results: beat GPT-3.5 and Llama 2 70B on several public benchmarks at launch
- Full architecture and JAX/Rust reference inference code published, useful for research and study
Best for
- Research into large-scale Mixture-of-Experts architectures and sparse inference
- A base model for teams that want to fine-tune their own assistant on permissively-licensed weights
- Historical and educational study of xAI's first-generation model
- Self-hosted experimentation where data must stay on-premises and an open license is required
How to access
| Provider | Model ID |
|---|---|
| xAI (open weights, GitHub) ↗ | grok-1 |
| Hugging Face (open weights) ↗ | xai-org/grok-1 |
Grok (flagship) — every version
The full lineage of the Grok (flagship) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
FAQ
When was Grok 1 released, and is it still in use?
xAI announced Grok 1 on November 3, 2023 as an early beta, available through X Premium+. It has since been superseded by Grok 1.5 (May 2024) and later Grok models. The base weights remain freely available as an open-weights release, but the hosted Grok chatbot now runs on much newer versions.
Are Grok 1's weights really open source?
Yes. On March 17, 2024, xAI published the full network architecture and weight parameters on GitHub and Hugging Face under the Apache 2.0 license, making it free to use, modify, and self-host. At 314B parameters it was the largest open Mixture-of-Experts model available at the time.
How big is Grok 1 and what architecture does it use?
Grok 1 is a Mixture-of-Experts (MoE) Transformer with 314 billion total parameters. It has 8 experts and activates 2 of them (about 25% of the weights) for each token. It uses 64 layers, an 8,192-token context window, rotary position embeddings, and a 131,072-token SentencePiece vocabulary.
Can I call Grok 1 through an API?
There was never an official xAI pay-as-you-go API for Grok 1. It was originally accessible only via an X Premium+ subscription, and the open-weights release is meant to be downloaded and self-hosted. Note that the released checkpoint is the raw pre-training base model — it was not chat- or instruction-tuned, so it needs fine-tuning to behave like an assistant.