Large-batchOptimizationforDenseVisualPredictions

Feb-19-2026, 06:21:58 GMT–Neural Information Processing Systems

At thet-th backward propagation step, we can derive the gradient il(wt)toupdatei-th module inM. The number in the bracket represents the batch size. We see that when the batch size is small (i.e.,32), the gradientvariancesaresimilar. N and K indicate the number of FPN levels and region proposals fed into the detection head. To evaluate this assumption, as shown in Figure 1, we have three observations. As illustrated by the second figure in Figure 1, the gradient misalignment phenomenon between detection head and backbone has been reduced.

artificial intelligence, machine learning, wehave, (18 more...)

Neural Information Processing Systems

Feb-19-2026, 06:21:58 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.95)

Duplicate Docs Excel Report

Title
76bea0a1cf7bf9b78f842009f6de15a1-Supplemental-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found