AI/TLDR

Two Minute Papers · 2026-06-22 · notable

Two Minute Papers: 'DeepSeek Just Solved AI's Billion Dollar Problem'

Two Minute Papers walks through the DualPath paper, which attacks the KV-cache I/O bottleneck behind agentic LLM serving costs and reports up to 1.96x higher online throughput.

Two Minute Papers thumbnail for 'DeepSeek Just Solved AI's Billion Dollar Problem'

Two Minute Papers explains why the storage bandwidth bottleneck — not raw compute — has been the real cost driver behind agentic LLM inference.

What is it?

A Two Minute Papers video covering DualPath, a research paper that targets the KV-cache loading bottleneck in disaggregated LLM serving. The video frames the result as a fix for the cost problem that has made running long-context agentic workloads so expensive at scale.

How does it work?

DualPath adds a second KV-cache loading route alongside the usual storage-to-prefill path: a storage-to-decode path that uses idle decode-engine NICs and then ships the cache to prefill engines over RDMA. Dynamic load balancing across the two paths keeps both sides busy instead of bandwidth-starving the prefill side.

Why does it matter?

The paper reports up to 1.87x offline throughput and 1.96x average online throughput on production agentic workloads with no SLO violations. For anyone running long-running agents or multi-turn tools, that maps directly to lower per-token serving cost without new hardware.

Who is it for?

ML infra engineers and teams running agentic LLM workloads

Try it

https://arxiv.org/abs/2602.21548

Sources · 2 outlets

Tags

  • video
  • two-minute-papers
  • deepseek
  • kv-cache
  • inference
  • agentic
  • throughput
  • dualpath

← All releases · Learn AI