Reviews: The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks

Neural Information Processing Systems 

This well-written paper is the latest in a series of works which analyze how signals propagate in random neural networks, by analyzing mean and variance of activations and gradients given random inputs and weights. The technical accomplishment can be considered incremental with respect to this series of works. However, while the techniques used are not new, the performed analysis leads to new insights on the use of batch/layer normalization. In particular, the analysis provides a close look on mechanisms that lead to pathological sharpness on DNNs, showing that the mean subtraction is the main ingredient to counter these mechanisms. While these claims would have to be verified in more complicated settings (e.g. with more complicated distributions on inputs and weights), it is an important first step to know that they hold for such simple networks.