Goto

Collaborating Authors

 accurate post-training quantization


Supplementary Material for PTQD: Accurate Post-Training Quantization for Diffusion Models Y efei He

Neural Information Processing Systems

ZIP Lab, Monash University, Australia We organize our supplementary material as follows: In section A, we provide a comprehensive explanation of extending PTQD to DDIM [10]. In section B, we show the statistical analysis of quantization noise. In section D, we provide additional visualization results on ImageNet and LSUN dataset. We first perform statistical tests to verify if the residual quantization noise adheres to a Gaussian distribution. This test is based on D'Agostino and Pearson's In Figure B, we present the variance of the residual uncorrelated quantization noise.


PTQD: Accurate Post-Training Quantization for Diffusion Models

Neural Information Processing Systems

Diffusion models have recently dominated image synthesis and other related generative tasks. However, the iterative denoising process is expensive in computations at inference time, making diffusion models less practical for low-latency and scalable real-world applications. Post-training quantization of diffusion models can significantly reduce the model size and accelerate the sampling process without requiring any re-training. Nonetheless, applying existing post-training quantization methods directly to low-bit diffusion models can significantly impair the quality of generated samples. Specifically, for each denoising step, quantization noise leads to deviations in the estimated mean and mismatches with the predetermined variance schedule. Moreover, as the sampling process proceeds, the quantization noise may accumulate, resulting in a low signal-to-noise ratio (SNR) during the later denoising steps. To address these challenges, we propose a unified formulation for the quantization noise and diffusion perturbed noise in the quantized denoising process.


Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning

Neural Information Processing Systems

We consider the problem of model compression for deep neural networks (DNNs) in the challenging one-shot/post-training setting, in which we are given an accurate trained model, and must compress it without any retraining, based only on a small amount of calibration input data. This problem has become popular in view of the emerging software and hardware support for executing models compressed via pruning and/or quantization with speedup, and well-performing solutions have been proposed independently for both compression approaches.In this paper, we introduce a new compression framework which covers both weight pruning and quantization in a unified setting, is time-and space-efficient, and considerably improves upon the practical performance of existing post-training methods. At the technical level, our approach is based on an exact and efficient realization of the classical Optimal Brain Surgeon (OBS) framework of [LeCun, Denker, and Solla, 1990] extended to also cover weight quantization at the scale of modern DNNs. From the practical perspective, our experimental results show that it can improve significantly upon the compression-accuracy trade-offs of existing post-training methods, and that it can enable the accurate compound application of both pruning and quantization in a post-training setting.


Supplementary Material for PTQD: Accurate Post-Training Quantization for Diffusion Models Y efei He

Neural Information Processing Systems

ZIP Lab, Monash University, Australia We organize our supplementary material as follows: In section A, we provide a comprehensive explanation of extending PTQD to DDIM [10]. In section B, we show the statistical analysis of quantization noise. In section D, we provide additional visualization results on ImageNet and LSUN dataset. We first perform statistical tests to verify if the residual quantization noise adheres to a Gaussian distribution. This test is based on D'Agostino and Pearson's In Figure B, we present the variance of the residual uncorrelated quantization noise.


PTQD: Accurate Post-Training Quantization for Diffusion Models

Neural Information Processing Systems

Diffusion models have recently dominated image synthesis and other related generative tasks. However, the iterative denoising process is expensive in computations at inference time, making diffusion models less practical for low-latency and scalable real-world applications. Post-training quantization of diffusion models can significantly reduce the model size and accelerate the sampling process without requiring any re-training. Nonetheless, applying existing post-training quantization methods directly to low-bit diffusion models can significantly impair the quality of generated samples. Specifically, for each denoising step, quantization noise leads to deviations in the estimated mean and mismatches with the predetermined variance schedule.