Articles / SeqWM: Sequential World Models for Multi-Robot Cooperation

SeqWM: Sequential World Models for Multi-Robot Cooperation

3 4 月, 2026 3 min read multi-robot-systemsworld-models

SeqWM: Sequential World Models for Multi-Robot Cooperation

Recent breakthroughs in Decision-Coupled World Models and Model-based Reinforcement Learning (RL) have enabled robots to simulate future dynamics internally—supporting robust planning and decision-making. However, scaling from single-robot to multi-robot systems introduces profound challenges: environment evolution is no longer governed by a single agent, but emerges from the coupled actions of multiple autonomous entities.

The Core Challenge: Modeling Joint Dynamics

In multi-robot settings, a fundamental question arises: How can a world model understand and predict the joint dynamics of multiple agents?

Traditional approaches treat the system holistically—feeding all robots’ states and actions into one monolithic model. But as robot count grows, this joint modeling suffers from:
– 📉 Exponential complexity growth, impairing learning stability and generalization;
– ⚡ Causal entanglement, where simultaneous action inputs create gradient conflicts;
– 🔁 Broken decision-world feedback loops, leading to rapid error accumulation during prediction.


Introducing SeqWM: Causal Sequential Decomposition

Presented at ICLR 2026, the Sequential World Model (SeqWM)—developed by the Deep Reinforcement Learning Group at the Chinese Academy of Sciences—rethinks multi-robot world modeling through causal, sequential conditioning.

Rather than predicting global state transitions in one step, SeqWM decomposes joint dynamics into an ordered chain of marginal causal contributions: each robot models only its own effect on the environment, conditioned on prior robots’ predicted actions.

Key Technical Insights

  • Marginalized Prediction: Each robot maintains its own lightweight world model, trained to estimate its incremental impact—not the full system outcome.
  • Intent Sharing via Trajectory Forecasting: Robots plan sequentially and broadcast their predicted trajectories—enabling downstream agents to adapt proactively, not reactively.
  • MPPI-Based Collaborative Planning: Uses Model Predictive Path Integral control to optimize actions under shared trajectory constraints—establishing explicit, differentiable intent alignment.

SeqWM Architecture Overview

Figure 1: SeqWM’s sequential decomposition enables scalable, interpretable multi-robot world modeling.


Experimental Validation

Evaluated across two high-fidelity simulation benchmarks:

  • 🤲 Bi-DexHands: Dual dexterous manipulation tasks (e.g., cooperative object assembly);
  • 🐕 Multi-Quadruped: Coordinated locomotion and navigation with four legged robots.

Results show SeqWM consistently outperforms SOTA methods in both task success rate (+23.7%) and sample efficiency (−41% samples needed).

Performance Comparison

Figure 4: Simulation results demonstrate superior coordination and robustness.


Emergent Collaborative Behaviors

SeqWM doesn’t just improve metrics—it fosters natural, human-like collaboration:

🧠 Predictive Adaptation

Robots anticipate partners’ motions and adjust preemptively. In ball-catching tasks, receivers move toward predicted drop zones before release—enabling stable grasping under uncertainty.

Predictive Adaptation

👥 Role Division

In box-pushing scenarios, roles self-emerge: one robot applies dominant forward force, while another fine-tunes orientation—without handcrafted reward shaping or communication protocols.

Role Division


Real-World Deployment: Sim-to-Real Transfer

SeqWM was successfully deployed on physical Unitree Go2-W quadruped robots, validating real-world viability in three complex tasks:
– 📦 Cooperative box pushing;
– 🚪 Narrow doorway traversal;
– 🎯 Target-guided navigation.

Real Robot Experiment 1

Real Robot Experiment 2

Behavior fidelity between simulation and reality exceeded 92%, confirming strong sim-to-real transferability.


Conclusion & Outlook

SeqWM reframes multi-robot world modeling not as a global inference problem, but as a structured, sequential causal process. By decoupling joint dynamics into composable, conditionally scoped predictions, it delivers:
– ✅ Scalable training and deployment;
– ✅ Interpretable, modular architecture;
– ✅ Emergent, adaptive collaboration;
– ✅ Seamless sim-to-real transfer.

As embodied AI advances, SeqWM offers a foundational blueprint for building robot teams that collaborate—not just coexist—with human-level intentionality and flexibility.


📄 Reference