Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

Jun-13-2026, 05:43:27 GMT–Neural Information Processing Systems

Diffusion Transformers (DiTs) are essential for video generation but suffer from significant latency due to the quadratic complexity of attention. By computing only critical tokens, sparse attention reduces computational costs and offers a promising acceleration approach. However, we identify that existing methods fail to approach optimal generation quality under the same computation budget for two reasons: (1) Inaccurate critical token identification: current methods cluster tokens based on position rather than semantics, leading to imprecise aggregated representations.

artificial intelligence, name change, proceedings, (6 more...)

Neural Information Processing Systems

Jun-13-2026, 05:43:27 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.56)