VILP: Imitation Learning with Latent Video Planning