Goto

Collaborating Authors

 metaquant




MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization

Neural Information Processing Systems

Tremendous amount of parameters make deep neural networks impractical to be deployed for edge-device-based real-world applications due to the limit of computational power and storage space. Existing studies have made progress on learning quantized deep models to reduce model size and energy consumption, i.e. converting full-precision weights ($r$'s) into discrete values ($q$'s) in a supervised training manner. However, the training process for quantization is non-differentiable, which leads to either infinite or zero gradients ($g_r$) w.r.t.




MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization

Neural Information Processing Systems

Tremendous amount of parameters make deep neural networks impractical to be deployed for edge-device-based real-world applications due to the limit of computational power and storage space. Existing studies have made progress on learning quantized deep models to reduce model size and energy consumption, i.e. converting full-precision weights ( r's) into discrete values ( q's) in a supervised training manner. However, the training process for quantization is non-differentiable, which leads to either infinite or zero gradients ( g_r) w.r.t. To address this problem, most training-based quantization methods use the gradient w.r.t. However, these methods only heuristically make training-based quantization applicable, without further analysis on how the approximated gradients can assist training of a quantized network.


MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization

Neural Information Processing Systems

Tremendous amount of parameters make deep neural networks impractical to be deployed for edge-device-based real-world applications due to the limit of computational power and storage space. Existing studies have made progress on learning quantized deep models to reduce model size and energy consumption, i.e. converting full-precision weights ($r$'s) into discrete values ($q$'s) in a supervised training manner. However, the training process for quantization is non-differentiable, which leads to either infinite or zero gradients ($g_r$) w.r.t. To address this problem, most training-based quantization methods use the gradient w.r.t. However, these methods only heuristically make training-based quantization applicable, without further analysis on how the approximated gradients can assist training of a quantized network. In this paper, we propose to learn $g_r$ by a neural network.