Towards Understanding the Condensation of Two-layer Neural Networks at Initial Training
Xu, Zhi-Qin John, Zhou, Hanxu, Luo, Tao, Zhang, Yaoyu
–arXiv.org Artificial Intelligence
Studying the implicit regularization effect of the nonlinear training dynamics of neural networks (NNs) is important for understanding why over-parameterized neural networks often generalize well on real dataset. Empirically, existing works have shown that weights of NNs condense on isolated orientations with a small initialization. The condensation dynamics implies that NNs can learn features from the training data with a network configuration effectively equivalent to a much smaller network during the training. In this work, we show that the multiple roots of activation function at origin is a key factor to understanding the condensation at the initial stage of training. Our experiments suggest that the maximal number of condensed orientations is twice of the multiplicity. Our theoretical analysis confirms experiments for two cases, one is for the activation function of multiplicity one and the other is for the one-dimensional input. This work makes a step towards understanding how small initialization implicitly leads NNs to condensation at initial stage of training, which lays a solid foundation for the future study of the nonlinear dynamics of NNs and its implicit regularization effect at a later stage of training.
arXiv.org Artificial Intelligence
May-29-2021
- Country:
- North America
- Canada (0.29)
- United States (0.28)
- North America
- Genre:
- Research Report > Experimental Study (0.34)
- Industry:
- Information Technology > Networks (0.34)
- Telecommunications > Networks (0.34)
- Technology: