Position-based Scaled Gradient for Model Quantization and Pruning - Appendix
–Neural Information Processing Systems
In this experiment, we only quantize the weights, not the activations, to compare the performance degradation as weight bit-width decreases. The mean squared errors (MSE) of the weights across different bit-widths are also reported. In Fig. A1, we display the full-precision weight distributions of the PSGD models and compare them Four random layers of each model are shown column-wise. The first row displays the model trained with SGD and L2 weight decay. This is also reported in Figure 1 of the original paper.
Neural Information Processing Systems
Aug-22-2025, 01:03:05 GMT
- Country:
- Asia
- Middle East > Jordan (0.04)
- South Korea > Seoul
- Seoul (0.06)
- Europe > Netherlands
- North Holland > Amsterdam (0.04)
- North America > Canada (0.04)
- Asia
- Technology: