bounds through a sunlit park wearing a yellow sweater prompt a joyful Corgi with a fluffy coat and perky a young woman with curly hair and a bright smile
–Neural Information Processing Systems
Video diffusion transformers have achieved remarkable progress in high-quality video generation, but remain computationally expensive due to the quadratic complexity of attention over high-dimensional video sequences. Recent acceleration methods enhance the efficiency by exploiting the local sparsity of attention scores; yet this the problem, y often struggle we propose with V accelerating ORTA, an acceleration the long-range frame computati work with on. T tw o o address novel components: (1) a sparse attention mechanism that efficiently captures long-range dependencies, and (2) a routing strategy that adaptively replaces full 3D attention with specialized sparse attention variants. VORTA achieves an end-to-end speedup 1 grate .76 with without various loss other of quality acceleration on VBench.
Neural Information Processing Systems
Jun-14-2026, 15:07:47 GMT
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.92)
- Research Report
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Machine Learning > Neural Networks (1.00)
- Natural Language (0.93)
- Representation & Reasoning (0.67)
- Information Technology > Artificial Intelligence