Theoretical Understanding of Batch-normalization: A Markov Chain Perspective
Daneshmand, Hadi, Kohler, Jonas, Bach, Francis, Hofmann, Thomas, Lucchi, Aurelien
Batch-normalization (BN) is a key component to effectively train deep neural networks. Empirical evidence has shown that without BN, the training process is prone to unstabilities. This is however not well understood from a theoretical point of view. Leveraging tools from Markov chain theory, we show that BN has a direct effect on the rank of the pre-activation matrices of a neural network. Specifically, while deep networks without BN exhibit rank collapse and poor training performance, networks equipped with BN have a higher rank. In an extensive set of experiments on standard neural network architectures and datasets, we show that the latter quantity is a good predictor for the optimization speed of training.
Mar-3-2020
- Country:
- Europe > Switzerland > Zürich > Zürich (0.04)
- Genre:
- Research Report > New Finding (0.93)
- Technology: