AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights

Open in new window