Gradient Descent Provably Optimizes Over-parameterized Neural Networks