Scalable methods for 8-bit training of neural networks

Ron Banner, Itay Hubara, Elad Hoffer, Daniel Soudry

Neural Information Processing Systems 

Quantized Neural Networks (QNNs) are often used to improve network efficiency during the inference phase, i.e. after the network has been trained. Extensive research in the field suggests many different quantization schemes. Still, the number of bits required, as well as the best quantization scheme, are yet unknown. Our theoretical analysis suggests that most of the training process is robust to substantial precision reduction, and points to only a few specific operations that require higher precision.