IBM Research · 2026-04-29 · major

IBM Granite 4.1 — 8B Instruct Matches Granite 4.0's 32B MoE on Tool Calling and Instruction Following

Item: IBM Granite 4.1 — 8B Instruct Matches Granite 4.0's 32B MoE on Tool Calling and Instruction Following
Rating: 4
Author: AI/TLDR

IBM ships Granite 4.1: dense decoder language models in 3B / 8B / 30B base + instruct flavours, plus refreshed Speech, Vision, Guardian, and multilingual Embedding R2 models, all under Apache 2.0 with up to 512K context.

IBM Granite 4.1 release artwork — three-tile illustration of the new model family — IBM Research

IBM's open-weight enterprise model family steps up: smaller dense models that match the previous generation's 32B mixture-of-experts.

Key specs

Context window	512K
Training tokens	~15T
Speech wer2 b	5.33%

What is it?

Granite 4.1 is the latest revision of IBM Research's open-weight Granite collection, released April 29, 2026 under Apache 2.0. The headline pieces are dense decoder-only language models at 3B, 8B, and 30B parameters in both base and instruct variants. The release also refreshes Granite Vision (document, table, chart, KVP extraction), Granite Speech (multilingual ASR and translation), Granite Guardian (harm detection), and a new Granite Embedding Multilingual R2 covering 200+ languages with up to 512K context.

How does it work?

IBM dropped the hybrid Mamba/MoE design of Granite 4.0 in favour of a straight dense, decoder-only transformer — the same architecture that dominates community fine-tuning. Training ran roughly 15 trillion tokens across multiple phases that progressively annealed toward higher-quality technical and scientific data. The team leans on data quality and post-training rather than long chain-of-thought reasoning: IBM Research engineer Rameswar Panda says Granite 4.1 'delivers competitive instruction-following and tool-calling performance without relying on long chains of thought.' The 8B instruct model is reported to match or beat Granite 4.0's 32B MoE on instruction-following and tool-calling benchmarks while being far cheaper to fine-tune and serve.

Why does it matter?

Granite is one of the few enterprise-targeted, Apache-2.0 model families with the full stack — text, speech, vision, embeddings, and a safety classifier — under a single license. Going back to dense weights (no MoE routing, no exotic state-space layers) makes it much easier to LoRA-finetune, quantize, and deploy on the kinds of mid-tier GPUs and on-prem hardware that IBM's enterprise customers actually run. The 30B instruct model gives a serious open option for tool-using agents that don't want to depend on a frontier API.

Who is it for?

Enterprise teams, on-prem deployments, fine-tuners

Try it

ollama run granite4.1:8b