Towards Understanding the Condensation of Two-layer Neural Networks at Initial Training

Xu, Zhi-Qin John, Zhou, Hanxu, Luo, Tao, Zhang, Yaoyu

May-29-2021–arXiv.org Artificial Intelligence

Studying the implicit regularization effect of the nonlinear training dynamics of neural networks (NNs) is important for understanding why over-parameterized neural networks often generalize well on real dataset. Empirically, existing works have shown that weights of NNs condense on isolated orientations with a small initialization. The condensation dynamics implies that NNs can learn features from the training data with a network configuration effectively equivalent to a much smaller network during the training. In this work, we show that the multiple roots of activation function at origin is a key factor to understanding the condensation at the initial stage of training. Our experiments suggest that the maximal number of condensed orientations is twice of the multiplicity. Our theoretical analysis confirms experiments for two cases, one is for the activation function of multiplicity one and the other is for the one-dimensional input. This work makes a step towards understanding how small initialization implicitly leads NNs to condensation at initial stage of training, which lays a solid foundation for the future study of the nonlinear dynamics of NNs and its implicit regularization effect at a later stage of training.

condensation, neural network, télécommunications, (17 more...)

arXiv.org Artificial Intelligence

May-29-2021

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada (0.29)
  - United States (0.28)

Genre:
- Research Report > Experimental Study (0.34)

Industry:
- Information Technology > Networks (0.34)
- Telecommunications > Networks (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found