Reviews: Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice
–Neural Information Processing Systems
The article is focused on the problem of understanding the learning dynamics of deep neural networks depending on both the activation functions used at the different layers and on the way the weights are initialized. It is mainly a theoretical paper with some experiments that confirm the theoretical study. The core of the contribution is made based on the random matrix theory. In the first Section, the paper describes the setup -- a deep neural network as a sequence of layers -- and also the tools that will be used to study their dynamics. The analysis mainly relies on the study of the singular values density of the jacobian matrix, this density being computed by a 4 step methods proposed in the article.
Neural Information Processing Systems
Oct-8-2024, 10:18:39 GMT
- Technology: