AI/TLDR

NVIDIA Labs · 2026-06-16 · major

cuTile Rust v0.2.0 — NVIDIA Labs ships NVFP4 GPU kernels in safe Rust

NVIDIA Labs ships cuTile Rust v0.2.0, a safe tile-based GPU kernel DSL for Rust with NVFP4 packing and block-scaled GEMM on B200. A companion paper, Fearless Concurrency on the GPU, reports 7 TB/s element-wise and 2 PFlop/s GEMM throughput.

GitHub repository card for NVlabs/cutile-rs

cuTile Rust is NVIDIA Labs' safe tile-based GPU kernel DSL for Rust, now with NVFP4 packing and a new performance paper.

Key specs

LicenseApache-2.0
GitHub stars494
Element wise throughput7 TB/s on B200
Gemm throughput2 PFlop/s on B200

What is it?

cuTile Rust is an open-source Rust DSL from NVIDIA Labs for writing safe, data-race-free GPU kernels with a tile-based memory model. cuTile Rust gives Rust programmers ownership-checked tensor handles, async kernel launches, and a host API that keeps device pointers from leaking past their lifetime.

How does it work?

cuTile Rust v0.2.0 adds CUDA 13.3 low-precision support — NVFP4 packing and unpacking plus block-scaled matrix multiply — and a new cutile-kernels crate of reusable inference primitives. The release ships executable NVFP4 and MXFP8 examples and reproducibility artifacts for the Fearless Concurrency on the GPU paper (arXiv 2606.15991), which reports 7 TB/s element-wise and 2 PFlop/s GEMM on B200.

Why does it matter?

cuTile Rust lets safety-critical Rust codebases compile their own GPU kernels without writing CUDA C++, with the same NVFP4 paths NVIDIA uses on Blackwell. For ML systems teams, that closes a real gap — Rust LLM runtimes can now share inference kernels with Python frameworks at low precision instead of dispatching through a foreign-function boundary.

Who is it for?

ML systems engineers, Rust GPU kernel authors, NVFP4 inference researchers

Try it

cargo add cutile@0.2.0 --features kernels

Sources · 3 outlets

Tags

  • nvidia
  • rust
  • gpu
  • cuda
  • nvfp4
  • kernels
  • low-precision
  • blackwell
  • b200
  • open-source

← All releases · Learn AI