Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks

Sun, Xiao, Choi, Jungwook, Chen, Chia-Yu, Wang, Naigang, Venkataramani, Swagath, Srinivasan, Vijayalakshmi (Viji), Cui, Xiaodong, Zhang, Wei, Gopalakrishnan, Kailash

Neural Information Processing Systems 

Reducing the numerical precision of data and computation is extremely effective in accelerating deep learning training workloads. Towards this end, 8-bit floating point representations (FP8) were recently proposed for DNN training. However, its applicability was demonstrated on a few selected models only and significant degradation is observed when popular networks such as MobileNet and Transformer are trained using FP8. This degradation is due to the inherent precision requirement difference in the forward and backward passes of DNN training. Using theoretical insights, we propose a hybrid FP8 (HFP8) format and DNN end-to-end distributed training procedure.