Learning to Generate Rigid Body Interactions with Video Diffusion Models