Reviews: Riemannian approach to batch normalization
–Neural Information Processing Systems
Paper Summary Starting from the observation that batch-normalization induces a particular form of scale invariance on the weight matrix, the authors propose instead to directly learn the weights on the unit-sphere. This is motivated from information geometry as an example of optimization on a Riemannian manifold, in particular the Stiefel manifold V(1,n) which contains unit-length vectors. As the descent direction on the unit sphere is well known (eq 7), the main contribution of the paper is in extending popular optimization algorithms (SGD momentum and Adam) to constrained optimization on the unit-sphere. Furthermore, the authors propose orthogonality as a (principled) replacement for L2 regularization, which is no longer meaningful with norm constraints. The method is shown to be effective across two families of models (VGG, wide resnet) on CIFAR-10, CIFAR-100 and SVHN.
Neural Information Processing Systems
Oct-7-2024, 18:28:27 GMT
- Technology: