ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction