Reviews: Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
–Neural Information Processing Systems
The suggested reparametrisation and its theoretical analysis are very interesting and I enjoyed reading the paper. However, some points in the theoretical analysis could be improved: The paper argues that the new parametrisation improves the conditioning matrix of the gradient, but neither a strong theoretical argument nor a empirical demonstration for this are given. In line 127 it is said "Empirically, we find that w is often (close to) a dominant eigenvector of the covariance matrix C", but the correspond experiments are neither shown in the paper nor in the supplemental material. In line 122/123 the authors claim "It has been observed that neural networks with batch normalization also have this property (to be relatively insensitive to different learning rates), which can be explained by this analysis.". However, it did not became clear to me, how the analysis of the previous sections can be directly transferred to batch normalisation.
Neural Information Processing Systems
Jan-20-2025, 22:12:06 GMT