Early Stage Convergence and Global Convergence of Training Mildly Parameterized Neural Networks

Neural Information Processing Systems 

The convergence of GD and SGD when training mildly parameterized neural networks starting from random initialization is studied.