Mixed Precision Training - Baidu Research

#artificialintelligence 

Figure 2: Mixed precision training for deep learning models. Secondly, we introduce a technique called loss-scaling that allows us to recover some of the small valued gradients. During training, some weight gradients have very small exponents that become zero in FP16 format. To overcome this problem, we scale the loss using a scaling factor at the start of back-propagation. Through the chain-rule, the gradients are also scaled up and become representable in FP16.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found