AI/TLDR

Alibaba · 2026-04-10 · major

HappyHorse-1.0 — Alibaba's Video Generation Model Tops Arena Rankings

Alibaba reveals HappyHorse-1.0 after it anonymously topped the Artificial Analysis Video Arena in both text-to-video and image-to-video. Generates video with synchronized audio from a single text prompt.

HappyHorse-1.0 AI video generation model showcase

Alibaba's stealth video model topped blind rankings under a pseudonym, then revealed itself as the new arena champion.

Key specs

Text to video elo1,389
Image to video elo1,416
Api accessApril 30

What is it?

HappyHorse-1.0 is an AI video generation model from Alibaba's Taotian Future Life Lab (ATH AI Innovation Unit). It appeared anonymously on the Artificial Analysis benchmarking platform around April 7 and quickly climbed to the top of blind-test rankings for both text-to-video and image-to-video generation. On April 10, Alibaba confirmed ownership.

How does it work?

HappyHorse uses a Transfusion architecture that integrates discrete text modeling with continuous visual diffusion signals in a unified Transformer sequence. It generates both video and synchronized audio from a single prompt, supports 6 languages, and achieves efficient 8-step generation with 1.2x acceleration. The model produces accurate lip-sync with ultra-low word error rates.

Why does it matter?

Video generation has been dominated by established players like Runway, Kling, and Google Veo. HappyHorse's anonymous debut and rapid climb to first place on blind rankings shows that Alibaba's newly formed video team can compete at the frontier. The integrated audio synthesis is a practical differentiator -- most video models still require separate audio pipelines.

Who is it for?

Content creators, video production teams, developers building video generation pipelines.

Sources · 2 outlets

Tags

  • video-generation
  • text-to-video
  • image-to-video
  • audio-synthesis
  • alibaba

← All releases · Learn AI