AI/TLDR

Zyphra · 2026-05-06 · major

Zyphra ZAYA1-8B — AMD-Trained MoE Reasoning Model With <1B Active Parameters

Open-weight 8.4B mixture-of-experts with only 760M active parameters, trained end-to-end on 1,024 AMD MI300X GPUs. Hits 89.1 on AIME26 and 71.6 on HMMT, matching open models 10–100x larger. Apache 2.0.

Hugging Face social card for the Zyphra ZAYA1-8B model release

Zyphra trained an 8.4B sparse MoE with under a billion active params on AMD MI300X — and it matches open models 10–100x larger on math and code.

Key specs

Parameters8.4B
Active params760M
GPQA71.0
Aime2689.1
Hmmt feb 2671.6
Live code bench v665.8
Training gpus1024 MI300X

What is it?

ZAYA1-8B is a small mixture-of-experts language model from Zyphra: 8.4B total parameters but only ~760M active per token. It's a reasoning-first open-weight model targeting math, code, and long-form analysis, trained end-to-end on AMD hardware rather than NVIDIA — a first at this scale for any well-known frontier-style release.

How does it work?

Zyphra trained on 1,024 AMD Instinct MI300X GPUs with Pensando Pollara networking on IBM Cloud. The architecture combines a Compressed Convolutional Attention (CCA) variant, an MLP-based expert router that improves routing stability, and learned residual scaling. Post-training adds a reasoning RL cascade and a new test-time compute method called Markovian RSA, which chunks parallel reasoning traces to keep memory constant during long deliberation.

Why does it matter?

Two things are notable. First, it shows AMD's MI300X stack is now production-ready for end-to-end frontier-style training, not just inference. Second, sub-1B active params getting 89 on AIME26 and 71 on HMMT keeps Zyphra's claim that you can win on intelligence-per-parameter without scaling totals to hundreds of billions — meaningful for on-device and edge deployment.

Who is it for?

open-weight researchers, anyone optimizing reasoning per FLOP, AMD-stack teams

Try it

huggingface.co/Zyphra/ZAYA1-8B or serverless via cloud.zyphra.com

Sources · 4 outlets

Tags

  • zyphra
  • moe
  • open-weight
  • amd-mi300x
  • reasoning
  • small-model
  • math
  • code
  • apache-2-0

← All releases · Learn AI