Netflix · 2026-04-03 · major

VOID — Netflix's Open-Source Video Object and Interaction Deletion Model

Netflix's first open-source AI model removes objects from video and rewrites the physics of the scene — if you delete a person holding a guitar, the guitar falls naturally. Apache 2.0 licensed.

VOID model diagram showing video object removal with physics-aware inpainting

Remove an object from a video and the model figures out what the rest of the scene would have done without it.

Key specs

License	Apache 2.0
GitHub stars	1.5k
Vram required	40GB+

What is it?

VOID (Video Object and Interaction Deletion) is Netflix Research's first publicly released AI model. It does video inpainting — removing objects from video — but unlike prior tools that just fill in the background, VOID also handles the downstream physical consequences. Remove a person pushing a ball, and the ball stops moving. Remove someone holding a guitar, and the guitar drops.

How does it work?

VOID uses a two-pass approach built on CogVideoX. First, a vision-language model identifies the causally affected regions of the scene — areas whose behavior would change if the target object were absent. Then a video diffusion model, fine-tuned on synthetic counterfactual data from Kubric and HUMOTO simulators, inpaints those regions with physically plausible outcomes. The second pass improves temporal consistency.

Why does it matter?

Video editing tools today struggle with interaction-aware removal. Deleting a person typically leaves floating objects or frozen physics. VOID solves this by treating video editing as causal reasoning, not just pixel filling. Being open-source under Apache 2.0 means VFX teams and indie filmmakers can integrate it into existing pipelines.

Who is it for?

VFX artists, video editors, film production teams, computer vision researchers.

Try it

Clone the repo and run the included Jupyter notebook (requires A100 or similar 40GB+ GPU).