HappyHorse 1.0: The Dark Horse That Rewrote the AI Video Leaderboard

In the first week of April 2026, a model nobody had heard of quietly entered the Artificial Analysis Video Arena — a blind-test leaderboard where real users vote on video quality without knowing which model produced which clip. Within days, HappyHorse-1.0 sat at the top of every major category, leaving established players like Seedance 2.0, Kling, and Veo in its wake.

No press conference. No corporate announcement. Just results.

The Numbers That Shocked the Industry

Category	Elo Rating	Rank
Text-to-Video (no audio)	1333–1357	#1
Image-to-Video (no audio)	1391–1406	#1 (all-time high)
Text-to-Video (with audio)	~1205	#2
Image-to-Video (with audio)	~1161	#2

The no-audio image-to-video score of 1406 set a new all-time record on the Arena. In the no-audio text-to-video category, HappyHorse led the previous champion Seedance 2.0 by roughly 60 Elo points — a massive gap in blind user-preference voting.

Who Built It?

The mystery was part of the magic. Artificial Analysis used the word "pseudonymous" when adding the model. Speculation ran wild on X, Reddit, and Chinese tech forums.

The truth emerged quickly: Zhang Di, former Vice President of Kuaishou and technical lead behind Kling AI, had joined Alibaba's Taotian Group in late 2025 and assembled a team at the Future Life Laboratory (未来生活实验室). Collaborators included Sand.ai (founded by Cao Yue, a specialist in autoregressive world models) and the GAIR Lab at Shanghai Institute of Intelligent Computing under Prof. Liu Pengfei.

HappyHorse is an evolution of the daVinci-MagiHuman project open-sourced in March 2026, refined and optimized for real-user preference scenarios.

Architecture: One Transformer to Rule Them All

HappyHorse 1.0 is a 15-billion-parameter unified Transformer with a single-stream, 40-layer self-attention architecture. What makes it distinctive:

Joint modeling — text tokens, image latents, video frames, and audio waveforms are packed into one sequence and denoised together. No cross-attention, no separate modules bolted on as afterthoughts.
Shared core, specialized edges — the first and last 4 layers use modality-specific projections; the middle 32 layers share parameters across all modalities.
8-step denoising via DMD-2 distillation — no Classifier-Free Guidance needed, which slashes inference cost.
Native 1080p output with 16:9 and 9:16 aspect ratios, producing 5–8 second clips.

This is the first time the open-source community has achieved true end-to-end audio-video joint pre-training from scratch in a single model.

Performance: Speed Meets Quality

On a single NVIDIA H100 GPU:

~2 seconds for a 5-second clip at 256p (preview quality)
~38 seconds for a 5-second clip at 1080p with synchronized audio

The model supports seven languages for lip-sync and voiceover — Mandarin, Cantonese, English, Japanese, Korean, German, and French — with a word error rate of 14.60%.

Fully Open Source

Unlike many "open" models that only release weights, HappyHorse ships everything:

Base model (15B parameters)
Distilled model (for faster inference)
Super-resolution module
Full inference code
Commercial-use license

The team released all components on GitHub and HuggingFace, making it one of the most complete open-source video generation packages ever published.

What This Means for Creators

HappyHorse represents a turning point. A fully open, commercially licensable model now matches or exceeds the quality of closed, API-only services. For creators and developers, this means:

Self-hosting becomes viable for production video generation
Fine-tuning on custom data is possible without vendor lock-in
Cost per video drops dramatically when running on your own hardware
Privacy — your prompts and outputs never leave your infrastructure

What's Next?

HappyHorse 1.0 is still fresh — weights just dropped, the community is benchmarking, and fine-tuned variants are already in the works. If the base model is this strong, the ecosystem around it could move fast.

We're keeping a close eye on HappyHorse at PicMorph. If it lives up to the hype in our own testing, expect to see it integrated alongside our existing video generation models soon.

The AI video generation landscape moves fast. HappyHorse proved that a small, focused team with the right architecture can leapfrog an entire industry — and then give it all away for free.

HappyHorse 1.0: The Mysterious AI Video Model That Conquered the Leaderboard Overnight

HappyHorse 1.0: The Dark Horse That Rewrote the AI Video Leaderboard

The Numbers That Shocked the Industry

Who Built It?

Architecture: One Transformer to Rule Them All

Performance: Speed Meets Quality

Fully Open Source

What This Means for Creators

What's Next?

Tags

Related Posts

Seedance 2.0: Official Launch Brief and Practical Takeaways

WAN 2.2 Animate: Revolutionary Open-Source Video Character Animation

Seedream-4 Now Available: ByteDance's Latest AI Image Generation Model