New AI Algorithms & Techniques Worth Knowing
Named AI techniques and algorithms worth knowing — the methods behind the models, distilled into plain-English explainers.
8 releases tracked
- Mamba-3 — Complex-Valued SSM with MIMO Updates Advances the Linear-Attention Frontier
Mamba-3 squeezes more from state space models with complex dynamics and MIMO — no extra parameters needed.
- ZipCCL — Lossless Gradient Compression Cuts Distributed LLM Training Communication by 1.35x
ZipCCL losslessly compresses LLM gradient collectives using the near-Gaussian structure of model tensors — no accuracy loss.
- AGoQ — 4-Bit Activation + 8-Bit Gradient Quantization Cuts Distributed LLM Training Memory by 52%
AGoQ achieves near 4-bit activation storage with 8-bit gradient communication — halving memory without hurting convergence.
- TurboQuant — Google's ICLR 2026 KV Cache Compression: 6x Memory, 8x Speed, Zero Accuracy Loss
TurboQuant compresses the KV cache to 3 bits with zero accuracy loss — 6x less memory and 8x faster attention on H100s.
- Unstructured Pruning Boosts LLM Test-Time Scaling — Pruned Models Can Outperform Their Unpruned Versions
Targeted weight removal can actually improve reasoning under test-time scaling — flipping the assumption that pruning hurts capability.
- Decoupled DiLoCo: 236× Less Bandwidth for Distributed LLM Training
Google DeepMind cuts inter-datacenter training bandwidth 236× while maintaining 88% goodput when chips fail — validated at 12B scale.
- DDTree — Diffusion Draft Trees for Faster Speculative Decoding
Speculative decoding via diffusion draft trees — up to 8.22× speedup over autoregressive inference, beating EAGLE-3.
- TriAttention — trigonometric KV cache compression
NVIDIA's new attention technique compresses the KV cache by exploiting a geometric property of Q/K vectors before RoPE is applied.