Reviews: Training Deep Neural Networks with 8-bit Floating Point Numbers

Neural Information Processing Systems 

The main goal of this work is to lower the precision of training with deep neural networks to better use new styles of hardware. In particular, the technical idea is to lower the size of the accumulator in dot products and the representation of all weights. The main observation is that a form of chunking, which is well known in the optimization and math programming community, has sufficiently better error properties to train DNNs. The paper is clear, simple, and effective. The title is a bit misleading, since it really is a mixed precision paper as a result of many of the numbers actually being FP16--not 8.