Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets

Open in new window