Early Stage Convergenceand Global Convergenceof Training Mildly Parameterized Neural Networks