Mixed Precision Training - Baidu Research
Figure 2: Mixed precision training for deep learning models. Secondly, we introduce a technique called loss-scaling that allows us to recover some of the small valued gradients. During training, some weight gradients have very small exponents that become zero in FP16 format. To overcome this problem, we scale the loss using a scaling factor at the start of back-propagation. Through the chain-rule, the gradients are also scaled up and become representable in FP16.
Feb-11-2018, 23:31:47 GMT
- Technology: