Muse Spark

Name: Muse Spark
Author: Meta

Meta Superintelligence Labs' first frontier model — a small, fast, natively multimodal reasoner that leads on health.

Overview

Muse Spark, released on April 8, 2026, is the first model from Meta Superintelligence Labs and Meta's first new flagship since Llama 4 in April 2025. It marks a strategic break for Meta: where the Llama line shipped open weights, Muse Spark is proprietary — Meta's first frontier model not released as open weights, though the company says it hopes to open-source future versions. It is positioned as a small, fast, natively multimodal reasoning model purpose-built for Meta's products.

On the Artificial Analysis Intelligence Index, Muse Spark scores 52 — roughly tripling Llama 4 Maverick's 18 and placing it fourth overall, behind only Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6. It is notably token-efficient, using about 58M output tokens to run the Index, in the same range as Gemini 3.1 Pro and far below Claude Opus 4.6 and GPT-5.4. Muse Spark is also the second-strongest vision model Artificial Analysis benchmarked, scoring 80.5% on MMMU-Pro.

Muse Spark's standout strength is health: trained on data curated with more than 1,000 physicians, it leads the HealthBench Hard leaderboard at 0.428, ahead of every other frontier model tested. Its Contemplating mode lifts hard-reasoning scores to 58.4 on Humanity's Last Exam (with tools) and 38.3 on FrontierScience Research, competing with extreme reasoning modes like Gemini Deep Think and GPT Pro. Today Muse Spark powers the Meta AI app and meta.ai for free and is rolling out across WhatsApp, Instagram, Facebook, Messenger, and AI glasses; a public API is in private preview.

Released	2026-04-08
License	Proprietary (Meta) — Meta's first frontier model not released as open weights
Weights	API only
Parameters	Not disclosed by Meta
Context	262K
Max output	Not disclosed by Meta
Architecture	Natively multimodal reasoning model. Meta Superintelligence Labs rebuilt its AI stack from the ground up to reach Llama 4 Maverick-level capability with over an order of magnitude less training compute. Reinforcement learning with a penalty on thinking time produces a "thought compression" effect — after first learning to think longer, the model compresses its reasoning to solve problems with far fewer tokens. A separate "Contemplating mode" scales test-time compute by orchestrating multiple agents that reason in parallel, then aggregating their solutions. Meta has not published parameter counts or whether the model is dense or mixture-of-experts.
Knowledge cutoff	Not disclosed by Meta
Modalities	Text, Vision, Audio
Status	Available

Benchmarks

Panel of per-benchmark bar charts (Intelligence Evaluations) comparing Muse Spark against named models across GDPval-AA, Terminal-Bench Hard, Tau2-Bench Telecom, AA-LCR, AA-Omniscience, Humanity's Last Exam, GPQA Diamond, SciCode, IFBench, and CritPT. — Full per-evaluation breakdown across the 10 Intelligence Index benchmarks, Muse Spark vs. peers. — Artificial Analysis

This model's scores

Artificial Analysis Intelligence Index52index
MMMU-Pro (vision)80.5%
Humanity's Last Exam39.9%
Humanity's Last Exam (with tools, Contemplating mode)58.4%
FrontierScience Research (Contemplating mode)38.3%
HealthBench Hard0.428score
SWE-Bench Verified77.4%
ARC-AGI-242.5%
CritPT (physics research)11%
Tau-squared Bench Telecom92%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	Free / 1M tokens
Output	Free / 1M tokens

Muse Spark is currently free to use via meta.ai and the Meta AI app (Meta account required). Artificial Analysis lists input and output at $0.00 per 1M tokens. There is no public paid API yet — API access is in private preview for select partners, and Meta has not announced a public rate card.

Pricing source ↗

Strengths

Leads the HealthBench Hard leaderboard (0.428), ahead of every other frontier model tested — trained with 1,000+ physicians
Strong native multimodality — second-best vision model on MMMU-Pro (80.5%) behind only Gemini 3.1 Pro
Highly token-efficient for its intelligence: ~58M output tokens to run the Artificial Analysis Index, far less than GPT-5.4 or Claude Opus 4.6
Contemplating mode scales test-time compute via parallel agents, reaching 58.4 on Humanity's Last Exam (with tools)
Reaches Llama 4 Maverick capability with over an order of magnitude less training compute
Free to use today via meta.ai and the Meta AI app

Best for

Health and medical question-answering where factual, comprehensive responses matter (its strongest area)
Multimodal reasoning over images — visual STEM questions, entity recognition, and localization
Visual coding — generating websites and small games from visual context
Hard science and math reasoning, especially with Contemplating mode for the most difficult problems
Powering consumer assistants across Meta's apps (WhatsApp, Instagram, Messenger) and AI glasses
Agentic and tool-use workflows that benefit from multi-agent orchestration

How to access

Provider	Model ID
Meta AI ↗	`muse-spark`

FAQ

Is Muse Spark open source?

No. Muse Spark is proprietary — it is Meta's first frontier model not released as open weights, a deliberate break from the open Llama line. Meta has said it hopes to open-source future versions of the model, but the April 2026 release is closed.

How much does Muse Spark cost?

It is currently free to use through meta.ai and the Meta AI app (a Meta account is required). Artificial Analysis lists input and output at $0.00 per 1M tokens. There is no public paid API yet — API access is in private preview for select partners, and Meta has not published a public rate card.

What is Contemplating mode?

Contemplating mode scales test-time compute by orchestrating multiple agents that reason in parallel and then aggregating their solutions, rather than relying on a single chain of thought. It lifts Muse Spark to 58.4 on Humanity's Last Exam (with tools) and 38.3 on FrontierScience Research, competing with extreme reasoning modes like Gemini Deep Think and GPT Pro.

What is Muse Spark best at?

Health. Trained on data curated with more than 1,000 physicians, Muse Spark leads the HealthBench Hard leaderboard at 0.428, ahead of every other frontier model tested. It is also a strong vision model, scoring 80.5% on MMMU-Pro — second only to Gemini 3.1 Pro.

// Overview

// Benchmarks

This model's scores

// Pricing

// Strengths

// Best for

// How to access

// FAQ