Large-batchOptimizationforDenseVisualPredictions

Neural Information Processing Systems 

At thet-th backward propagation step, we can derive the gradient il(wt)toupdatei-th module inM. The number in the bracket represents the batch size. We see that when the batch size is small (i.e.,32), the gradientvariancesaresimilar. N and K indicate the number of FPN levels and region proposals fed into the detection head. To evaluate this assumption, as shown in Figure 1, we have three observations. As illustrated by the second figure in Figure 1, the gradient misalignment phenomenon between detection head and backbone has been reduced.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found