Global Convergence of Gradient Descent for Deep Linear Residual Networks

Lei Wu, Qingcan Wang, Chao Ma

Neural Information Processing Systems 

We analyze the global convergence of gradient descent for deep linear residual networks by proposing a new initialization: zero-asymmetric (ZAS) initialization. It is motivated by avoiding stable manifolds of saddle points.