HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks

Oct-11-2024, 11:51:09 GMT–Neural Information Processing Systems

Quantization is an effective method for reducing memory footprint and inference time of Neural Networks. However, ultra low precision quantization could lead to significant degradation in model accuracy. A promising method to address this is to perform mixed-precision quantization, where more sensitive layers are kept at higher precision. However, the search space for a mixed-precision quantization is exponential in the number of layers. Recent work has proposed a novel Hessian based framework, with the aim of reducing this exponential search space by using second-order information.

hessian aware trace-weighted quantization, neural network, quantization, (8 more...)

Neural Information Processing Systems

Oct-11-2024, 11:51:09 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.63)
  - Representation & Reasoning (0.63)