Arcee AI · 2026-04-01 · major
Trinity-Large-Thinking — Arcee's 400B Open Reasoning Agent Model
Open-weight 400B MoE reasoning model with 13B active parameters per token, Apache 2.0 licensed. Ranks #2 on PinchBench for autonomous agents at $0.85/M output tokens.

A 400B sparse model that activates only 13B parameters per token and costs 96% less than Opus for agent tasks.
Key specs
| License | Apache 2.0 |
|---|---|
| Parameters | 400B |
| Active params | 13B |
| Price (output) | $0.85/M tokens |
| Pinch bench rank | #2 |
What is it?
Trinity-Large-Thinking is a 400-billion-parameter sparse Mixture-of-Experts model from Arcee AI, a small US-based startup. It uses a 4-of-256 expert routing strategy, meaning only 13 billion parameters are active on any given token. The model is trained specifically for long-horizon reasoning and multi-turn tool calling, not general chat.
How does it work?
Built on Trinity-Large-Base, the model is post-trained with extended chain-of-thought reasoning and agentic reinforcement learning. It generates explicit reasoning traces in think blocks before producing its final response. The extreme sparsity (1.56% routing fraction) keeps inference cost low while maintaining a massive total knowledge base.
Why does it matter?
Open-weight models that perform well on autonomous agent benchmarks have been rare. Trinity-Large-Thinking ranks #2 on PinchBench (behind only Opus) while costing roughly $0.85 per million output tokens on OpenRouter. For teams building production agents, this changes the cost calculus significantly.
Who is it for?
Teams building production AI agents who need strong reasoning at low inference cost.
Try it
openrouter.ai/arcee-ai/trinity-large-thinking