On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport