Global Convergence of Gradient Descent for Deep Linear Residual Networks
–Neural Information Processing Systems
We analyze the global convergence of gradient descent for deep linear residual networks by proposing a new initialization: zero-asymmetric (ZAS) initialization. It is motivated by avoiding stable manifolds of saddle points.
Neural Information Processing Systems
Jan-21-2025, 21:00:31 GMT