Articles / World Models: The Dreamcore Arms Race

World Models: The Dreamcore Arms Race

7 5 月, 2026 5 min read AI-researchworld-models

World Models: The Dreamcore Arms Race

World Models Are the Dreamcore Arms Race

「World Model Interpretation Framework」

In obscure corners of the AI ecosystem, world models are erupting en masse — like spring bamboo after rain.

Not exaggerating: In April, Alibaba quietly launched internal testing for its world model Happy Oyster, ahead of the public release of its video model Happy Horse. While Happy Horse reviews flooded feeds, almost no one examined Happy Oyster — despite its bold ambition.

Meanwhile:
– Tencent released and open-sourced Hunyuan 3D World Model 2.0, yet it garnered less attention than the short-lived King of Glory World High.
– Across the Pacific, figures like Fei-Fei Li and Jensen Huang unveiled next-gen world models — all met with near silence.
– Even Lingguang shipped a mobile-first world model — “AI world-building, but only on smartphones.”
– ByteDance is widely suspected to be developing its own — possibly codenamed Seedworld — completing the industry-wide sweep.

So why has none of these tools triggered the usual media frenzy — no “game over”, no “AGI is here” headlines?

The answer is simple: they’re still deeply flawed.


🦪 Happy Oyster: Two Modes, One Reality Check

Director Mode (Real-Time Video Generation)

Happy Oyster offers two interaction paradigms:

  • Director Mode: Users steer short video generation via dynamic prompts — like playing an AI-powered interactive film game.

But like Pixverse R1 and Odyssey-2, it suffers from core limitations:
– ❌ No scene continuity: Newly generated frames overwrite prior context.
– ❌ Error accumulation: Visual coherence degrades rapidly over time.

To test it, we prompted: “An astronaut walking on the Moon”, then iteratively added:

“An alien appears” → “They shake hands” → “They marry and have children” → “Alien acid rain destroys their nest”

Happy Oyster Moon Test

Observation: Alibaba sidestepped continuity issues by cutting scenes every few seconds — restarting generation from scratch. Clever? Yes. A true world model? Debatable.

While characters persisted (e.g., the alien stayed), consistency in appearance, motion, and narrative collapsed — yielding visuals reminiscent of Western dreamcore aesthetics: surreal, glitchy, emotionally uncanny.

Then we uploaded a photo of Doubao and prompted: “Wan Dan! I’m Surrounded by Bean Sisters!”

Doubao Game Test

  • A horse mutated into a two-headed abomination.
  • Doubao duplicated like The Shining, terrifying at first glance.
  • Yet — remarkably — all agents spoke and reacted without explicit scripts. Vague instructions drove evolving dialogue and behavior.

💡 This hints at a compelling niche: AI virtual companions.

Imagine Linglong, Yongchu Tafi, or Love and Deep Space characters as persistent, responsive, audio-enabled digital twins — all running inside a world model. Not just chat — co-existence.

Still: Is real-time video generation really a world model? Or just a highly interactive video agent?


Wandering Mode (Interactive 3D World Simulation)

Here, users define scene + character via dual prompts, then freely navigate the generated environment — closely mirroring Google’s Genie 3.

Happy Oyster vs Genie 3 UI

⚠️ Fun fact: The English UI shown above belongs to Happy Oyster — the Chinese one is Genie 3.

We benchmarked both using three physics-aware scenarios:

🏭 Scenario 1: Shenzhen Electronics Factory

  • Genie 3: Generated a convincing factory floor. Navigation was stable — but interactions felt stiff; collisions were clunky; textures resembled low-res 3D assets.

Genie 3 Factory

  • Happy Oyster: Higher visual fidelity — less “blobby” avatars. But severe instability:
  • Returning to your workstation? It’s gone.
  • Spinning 360°? A wall becomes a corridor.
  • Suddenly, two new coworkers appear — as if HR hired them mid-session.

This isn’t simulation — it’s digital liminal space horror.

🐉 Scenario 2: Fire-Breathing Dragon in Forest

Both dragons forgot their purpose mid-walk — a shared flaw pointing to absent memory architecture.

Happy Oyster Dragon
Genie 3 Dragon

🚦 Scenario 3: Crosswalk Physics Test

  • Happy Oyster: Cars stop when you cross — then resume. A glimmer of emergent traffic logic.

Happy Oyster Crosswalk

  • Genie 3: All cars freeze — not due to caution, but because its 60-second limit coincides with a red light. A bug disguised as civility.

Genie 3 Crosswalk

💥 Scenario 4: Collision World

  • Happy Oyster: Smooth Cybertruck crashes — occasional clipping, but kinetic.

Happy Oyster Crash

  • Genie 3: Strong impact feedback, evasive pedestrians — feels like GTA VI, but with even more clipping.

Genie 3 Crash


🔍 What Is a World Model — Really?

A true world model should solve two fundamental AGI bottlenecks:

  1. Escaping LLM Dominance: Offering a new paradigm beyond language — one grounded in spatial, causal, and physical reasoning.
  2. Enabling Embodied Interaction: Letting humans and AI coexist and act within shared 3D environments, bridging silicon and carbon.

Yet current implementations fall short:
– They lack persistent memory and global state.
– Physics simulation remains shallow — rule-based, not learned.
– Multi-agent coordination is brittle or disabled entirely.

As Google admits openly in its Genie 3 documentation, these are early-stage prototypes — not production-ready simulators.


🌑 The Dreamcore Arms Race

This isn’t innovation — it’s strategic anxiety.

Companies race to ship half-baked world models not because they’re useful, but because:

“If we don’t publish something now, investors might think we’ve missed the wave.”

It mirrors the pre-Hiroshima nuclear arms race — captured vividly in the play Copenhagen: scientists building reactors they barely understand, terrified that someone else might be building the bomb.

Today’s world model landscape resembles a dark forest, upgraded: Everyone knows most releases are vaporware — yet no one dares pause. Why? Because what if the other guy isn’t faking it?

So we get:
– A flood of press releases.
– A torrent of demo videos.
– A sea of inconsistent, unstable, dreamlike simulations.

And yet — this feverish activity matters. It fuels research, attracts talent, drives infrastructure investment, and — yes — generates GDP.

As the author notes:

“It’s fine that you burn money on unrealistic things — as long as it’s not my money. Go explore Mars. Just keep making the狠活.

(Cover image generated by ChatGPT. Article written manually by Luo Zima, published by ‘Burial AI’.)


This article was originally published on AITNT — China’s leading one-stop AI news and intelligence platform.