Reviews: Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks

Neural Information Processing Systems 

Originality - This basically amounts to using two different floating point formats - one for forward, and one for backward. Or another way to think about it is that we are allowing more freedom in the mantissa/exponent divide for floating point. That's a good observation to have, theoretically, but how would a framework implement this, practically? For example, maybe I missed it, but I don't see how you convert between 1-4-3 and 1-5-2 formats when you prepare for back prop if we were to productize this. Do the frameworks now have to support 2 more data types?