Alibaba · 2026-04-10 · major
HappyHorse-1.0 — Alibaba's Video Generation Model Tops Arena Rankings
Alibaba reveals HappyHorse-1.0 after it anonymously topped the Artificial Analysis Video Arena in both text-to-video and image-to-video. Generates video with synchronized audio from a single text prompt.

Alibaba's stealth video model topped blind rankings under a pseudonym, then revealed itself as the new arena champion.
Key specs
| Text to video elo | 1,389 |
|---|---|
| Image to video elo | 1,416 |
| Api access | April 30 |
What is it?
HappyHorse-1.0 is an AI video generation model from Alibaba's Taotian Future Life Lab (ATH AI Innovation Unit). It appeared anonymously on the Artificial Analysis benchmarking platform around April 7 and quickly climbed to the top of blind-test rankings for both text-to-video and image-to-video generation. On April 10, Alibaba confirmed ownership.
How does it work?
HappyHorse uses a Transfusion architecture that integrates discrete text modeling with continuous visual diffusion signals in a unified Transformer sequence. It generates both video and synchronized audio from a single prompt, supports 6 languages, and achieves efficient 8-step generation with 1.2x acceleration. The model produces accurate lip-sync with ultra-low word error rates.
Why does it matter?
Video generation has been dominated by established players like Runway, Kling, and Google Veo. HappyHorse's anonymous debut and rapid climb to first place on blind rankings shows that Alibaba's newly formed video team can compete at the frontier. The integrated audio synthesis is a practical differentiator -- most video models still require separate audio pipelines.
Who is it for?
Content creators, video production teams, developers building video generation pipelines.