Learning Plug-and-play Memory for Guiding Video Diffusion Models