Luck Matters: Understanding Training Dynamics of Deep ReLU Networks

Tian, Yuandong, Jiang, Tina, Gong, Qucheng, Morcos, Ari

Jun-10-2019–arXiv.org Machine Learning

We analyze the dynamics of training deep ReLU networks and their implications on generalization capability. Using a teacher-student setting, we discovered a novel relationship between the gradient received by hidden student nodes and the activations of teacher nodes for deep ReLU networks. With this relationship and the assumption of small overlapping teacher node activations, we prove that (1) student nodes whose weights are initialized to be close to teacher nodes converge to them at a faster rate, and (2) in over-parameterized regimes and 2-layer case, while a small set of lucky nodes do converge to the teacher nodes, the fanout weights of other nodes converge to zero. This framework provides insight into multiple puzzling phenomena in deep learning like over-parameterization, implicit regularization, lottery tickets, etc. We verify our assumption by showing that the majority of BatchNorm biases of pre-trained VGG11/13/16/19 models are negative.

artificial intelligence, machine learning, node, (17 more...)

arXiv.org Machine Learning

Jun-10-2019

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)
- Contests & Prizes (0.34)

Industry:
- Leisure & Entertainment > Games (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found