MixDiT: Accelerating Image Diffusion Transformer Inference with Mixed-Precision MX Quantization

Kim, Daeun, Hwang, Jinwoo, Oh, Changhun, Park, Jongse

Apr-14-2025–arXiv.org Artificial Intelligence

Diffusion Transformer (DiT) has driven significant progress in image generation tasks. However, DiT inferencing is notoriously compute-intensive and incurs long latency even on datacenter-scale GPUs, primarily due to its iterative nature and heavy reliance on GEMM operations inherent to its encoder-based structure. To address the challenge, prior work has explored quantization, but achieving low-precision quantization for DiT inferencing with both high accuracy and substantial speedup remains an open problem. To this end, this paper proposes MixDiT, an algorithm-hardware co-designed acceleration solution that exploits mixed Microscaling (MX) formats to quantize DiT activation values. MixDiT quantizes the DiT activation tensors by selectively applying higher precision to magnitude-based outliers, which produce mixed-precision GEMM operations. To achieve tangible speedup from the mixed-precision arithmetic, we design a MixDiT accelerator that enables precision-flexible multiplications and efficient MX precision conversions. Our experimental results show that MixDiT delivers a speedup of 2.10-5.32 times over RTX 3090, with no loss in FID.

artificial intelligence, machine learning, quantization, (16 more...)

arXiv.org Artificial Intelligence

Apr-14-2025

arXiv.org PDF

Add feedback

Country:
- Asia > South Korea > Daejeon > Daejeon (0.04)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)