Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

Jeffrey Pennington, Samuel Schoenholz, Surya Ganguli

May-28-2025, 04:49:31 GMT–Neural Information Processing Systems

It is well known that weight initialization in deep networks can have a dramatic impact on learning speed. For example, ensuring the mean squared singular value of a network's input-output Jacobian is O(1) is essential for avoiding exponentially vanishing or exploding gradients. Moreover, in deep linear networks, ensuring that all singular values of the Jacobian are concentrated near 1 can yield a dramatic additional speed-up in learning; this is a property known as dynamical isometry. However, it is unclear how to achieve dynamical isometry in nonlinear deep networks. We address this question by employing powerful tools from free probability theory to analytically compute the entire singular value distribution of a deep network's input-output Jacobian.

artificial intelligence, dynamical isometry, machine learning, (16 more...)

Neural Information Processing Systems

May-28-2025, 04:49:31 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice
Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

Similar Docs Excel Report more

Title	Similarity	Source
None found