FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention

Neural Information Processing Systems 

Video diffusion models have made substantial progress in various video generation applications. However, training models for long video generation tasks require significant computational and data resources, posing a challenge to developing long video diffusion models.This paper investigates a straightforward and training-free approach to extend an existing short video diffusion model (e.g.