New AI Algorithms & Techniques Worth Knowing

Named AI techniques and algorithms worth knowing — the methods behind the models, distilled into plain-English explainers.

8 releases tracked

Mamba-3 — Complex-Valued SSM with MIMO Updates Advances the Linear-Attention FrontierCMU / Princeton · 2026-04-23 · major
Mamba-3 squeezes more from state space models with complex dynamics and MIMO — no extra parameters needed.
ZipCCL — Lossless Gradient Compression Cuts Distributed LLM Training Communication by 1.35xMultiple institutions · 2026-04-30 · notable
ZipCCL losslessly compresses LLM gradient collectives using the near-Gaussian structure of model tensors — no accuracy loss.
AGoQ — 4-Bit Activation + 8-Bit Gradient Quantization Cuts Distributed LLM Training Memory by 52%Multiple institutions · 2026-05-01 · notable
AGoQ achieves near 4-bit activation storage with 8-bit gradient communication — halving memory without hurting convergence.
TurboQuant — Google's ICLR 2026 KV Cache Compression: 6x Memory, 8x Speed, Zero Accuracy LossGoogle Research / Google DeepMind / NYU · 2026-04-23 · major
TurboQuant compresses the KV cache to 3 bits with zero accuracy loss — 6x less memory and 8x faster attention on H100s.
Unstructured Pruning Boosts LLM Test-Time Scaling — Pruned Models Can Outperform Their Unpruned VersionsMultiple institutions · 2026-04-28 · notable
Targeted weight removal can actually improve reasoning under test-time scaling — flipping the assumption that pruning hurts capability.
Decoupled DiLoCo: 236× Less Bandwidth for Distributed LLM TrainingGoogle DeepMind · 2026-04-23 · major
Google DeepMind cuts inter-datacenter training bandwidth 236× while maintaining 88% goodput when chips fail — validated at 12B scale.
DDTree — Diffusion Draft Trees for Faster Speculative DecodingTechnion · 2026-04-14 · notable
Speculative decoding via diffusion draft trees — up to 8.22× speedup over autoregressive inference, beating EAGLE-3.
TriAttention — trigonometric KV cache compressionNVIDIA · 2026-04-07 · notable
NVIDIA's new attention technique compresses the KV cache by exploiting a geometric property of Q/K vectors before RoPE is applied.

← All releases · Learn AI