Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets