Fast Mixingof Stochastic Gradient Descent with Normalizationand Weight Decay
–Neural Information Processing Systems
Under 2.1, 2.3, 5.1, 5.2 and 5.3, let x , ( 0) X , ( 0) xinit 2 U forall , >0 for SGD+WD(2) and SDE(3).
Neural Information Processing Systems
Feb-8-2026, 11:04:33 GMT