Theoretical Understanding of Batch-normalization: A Markov Chain Perspective

Daneshmand, Hadi, Kohler, Jonas, Bach, Francis, Hofmann, Thomas, Lucchi, Aurelien

Mar-3-2020–arXiv.org Machine Learning

Batch-normalization (BN) is a key component to effectively train deep neural networks. Empirical evidence has shown that without BN, the training process is prone to unstabilities. This is however not well understood from a theoretical point of view. Leveraging tools from Markov chain theory, we show that BN has a direct effect on the rank of the pre-activation matrices of a neural network. Specifically, while deep networks without BN exhibit rank collapse and poor training performance, networks equipped with BN have a higher rank. In an extensive set of experiments on standard neural network architectures and datasets, we show that the latter quantity is a good predictor for the optimization speed of training.

artificial intelligence, machine learning, matrix, (18 more...)

arXiv.org Machine Learning

Mar-3-2020

arXiv.org PDF

Add feedback

Country:
- Europe > Switzerland > Zürich > Zürich (0.04)

Genre:
- Research Report > New Finding (0.93)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.88)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.61)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found