Reviews: Which Neural Net Architectures Give Rise to Exploding and Vanishing Gradients?

Oct-7-2024, 05:27:46 GMT–Neural Information Processing Systems

The paper studies the gradient vanishing/exploding problem (EVGP) theoretically in deep fully connected ReLU networks. As a substitute for ensuring if gradient vanishing/exploding has been avoided, the paper proposes two criteria: annealed EVGP and quenched EVGP. It is finally shown that both these criteria are met if the sum of reciprocal of layer widths of the network is a small number (thus the width of all layers should ideally be large). To confirm this empirically, the paper uses an experiment from a concurrent work. Comments: To motivate formally studying EVGP in deep networks, the authors refer to papers which suggest looking at the distribution of singular values of the input-output Jacobian.

criteria, exploding and vanishing gradient, singular value, (7 more...)

Neural Information Processing Systems

Oct-7-2024, 05:27:46 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)