2SQ-VDiT: Accurate Quantized Video Diffusion Transformer with Salient Data and Sparse Token Distillation
–Neural Information Processing Systems
To address these issues, we propose S2Q-VDiT, a posttraining quantization framework for V-DMs that leverages Salient data and Sparse token distillation. During the calibration phase, we identify that quantization performance is highly sensitive to the choice of calibration data. To mitigate this, we introduce Hessian-aware Salient Data Selection, which constructs high-quality calibration datasets by considering both diffusion and quantization characteristics unique to V-DMs. To tackle the learning challenges, we further analyze the sparse attention patterns inherent in V-DMs.
Neural Information Processing Systems
Jun-16-2026, 03:06:52 GMT
- Country:
- Asia (0.28)
- Genre:
- Research Report > Experimental Study (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Representation & Reasoning (1.00)
- Natural Language (1.00)
- Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence