What is Apple's Quant for Neural Networks Quantization - Analytics India Magazine

#artificialintelligence 

Large Neural Networks are difficult to use in production environments as they are memory intensive and are slow during inference. Most successful Deep Learning Models such as Transformers are being followed by their Lite Versions which dramatically speed up inference trading off accuracy. In this article, let's explore Least Squares Quantization, an algorithm to speed up large neural networks by quantizing them while reducing the accuracy gap from the non-quantized model. Hadi Pouransari, Zhucheng Tu, Oncel Tuzel, researchers at Apple, introduced this approach in a paper- Least Squares Binary Quantization of Neural Networks, on 23rd March 2020. We all agree that smaller models are better for practical purposes in memory usage and inference time.