Eyeline Labs / Netflix · 2026-04-23 · notable
Vista4D: Re-Render Any Video from a New Camera Angle (CVPR 2026 Highlight)
CVPR 2026 Highlight from Eyeline Labs and Netflix. Vista4D scaffolds any monocular video with a 4D point cloud, then re-renders the scene from any camera path using a fine-tuned Wan2.1 14B diffusion transformer. 76 HuggingFace upvotes. Users preferred Vista4D for overall fidelity 77% of the time over the best baseline.

Give any monocular video a 4D point-cloud scaffold, then re-render the scene from any camera angle you choose.
Key specs
| GitHub stars | 81 |
|---|---|
| User preference (overall fidelity) | 77.4% |
| Hugging face upvotes | 76 |
| Conference | CVPR 2026 Highlight |
What is it?
Vista4D is a CVPR 2026 Highlight paper from researchers at Eyeline Labs, Netflix, Columbia, UCLA, Stony Brook, and Oxford. It takes a monocular source video and synthesizes the same dynamic scene from entirely different camera trajectories and viewpoints — reshooting footage that was never physically captured from that angle.
How does it work?
The method reconstructs a temporally-persistent 4D point cloud from the source video using static pixel segmentation and depth estimation. A fine-tuned Wan2.1 14B video diffusion transformer is conditioned on both the source video and a point-cloud render of the target view, with in-context latent token concatenation rather than cross-attention. The system is trained on noisy reconstructed multi-view data specifically to be robust to depth estimation artifacts at inference time. Optional extensions support 4D scene recomposition — editing the point cloud to insert or remove subjects — and dynamic scene expansion by incorporating additional camera captures.
Why does it matter?
Prior video reshooting methods fail on real-world footage because depth estimation artifacts cause jitter and visual inconsistency across viewpoints. Vista4D's training approach directly addresses this, producing stable re-renders under large viewpoint changes. Users preferred it over the strongest baseline 77% of the time for overall fidelity. Relevant for VFX teams generating alternate angles from a single camera pass, autonomous driving data augmentation, and video editing tools that need synthetic multi-viewpoint data.
Who is it for?
Computer vision researchers; VFX and video production teams; developers building video editing tools
Try it
Code and weights: https://github.com/Eyeline-Labs/Vista4D