Implicit Regularization and Convergence for Weight Normalization

Dec-23-2025, 20:06:38 GMT–Neural Information Processing Systems

Normalization methods such as batch, weight, instance, and layer normalization are commonly used in modern machine learning. Here, we study the weight normalization (WN) method \cite{salimans2016weight} and a variant called reparametrized projected gradient descent (rPGD) for overparametrized least squares regression and some more general loss functions. WN and rPGD reparametrize the weights with a scale $g$ and a unit vector such that the objective function becomes \emph{non-convex}. We show that this non-convex formulation has beneficial regularization effects compared to gradient descent on the original objective. These methods adaptively regularize the weights and \emph{converge linearly} close to the minimum $\ell_2$ norm solution even for initializations far from zero.

implicit regularization and convergence, name change, weight normalization, (6 more...)

Neural Information Processing Systems

Dec-23-2025, 20:06:38 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)