Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation
–Neural Information Processing Systems
Diffusion Transformers (DiTs) are essential for video generation but suffer from significant latency due to the quadratic complexity of attention. By computing only critical tokens, sparse attention reduces computational costs and offers a promising acceleration approach. However, we identify that existing methods fail to approach optimal generation quality under the same computation budget for two reasons: (1) Inaccurate critical token identification: current methods cluster tokens based on position rather than semantics, leading to imprecise aggregated representations.
Neural Information Processing Systems
Jun-13-2026, 05:43:27 GMT
- Technology: