Position-based Scaled Gradient for Model Quantization and Pruning - Appendix

Neural Information Processing Systems 

In this experiment, we only quantize the weights, not the activations, to compare the performance degradation as weight bit-width decreases. The mean squared errors (MSE) of the weights across different bit-widths are also reported. In Fig. A1, we display the full-precision weight distributions of the PSGD models and compare them Four random layers of each model are shown column-wise. The first row displays the model trained with SGD and L2 weight decay. This is also reported in Figure 1 of the original paper.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found