Goto

Collaborating Authors

 vorta


bounds through a sunlit park wearing a yellow sweater prompt a joyful Corgi with a fluffy coat and perky a young woman with curly hair and a bright smile

Neural Information Processing Systems

Video diffusion transformers have achieved remarkable progress in high-quality video generation, but remain computationally expensive due to the quadratic complexity of attention over high-dimensional video sequences. Recent acceleration methods enhance the efficiency by exploiting the local sparsity of attention scores; yet this the problem, y often struggle we propose with V accelerating ORTA, an acceleration the long-range frame computati work with on. T tw o o address novel components: (1) a sparse attention mechanism that efficiently captures long-range dependencies, and (2) a routing strategy that adaptively replaces full 3D attention with specialized sparse attention variants. VORTA achieves an end-to-end speedup 1 grate .76 with without various loss other of quality acceleration on VBench.