Reviews: On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

Neural Information Processing Systems 

This paper considers the problem of optimizing over measures instead of parameters directly ( as is standard in ML), for differentiable predictors with convex loss. This is an infinite dimensional convex optimization problem. The paper considers instead optimizing with m particles (dirac deltas). As m tends to infinity this corresponds to optimizing over the measure space. Proposition 2.3 shows existence and uniqueness of the particle gradient flow for a given initialization.