Early Stage Convergence and Global Convergence of Training Mildly Parameterized Neural Networks