Goto

Collaborating Authors

 gradient flow


Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced

Neural Information Processing Systems

We study the implicit regularization imposed by gradient descent for learning multi-layer homogeneous functions including feed-forward fully connected and convolutional deep neural networks with linear, ReLU or Leaky ReLU activation. We rigorously prove that gradient flow (i.e.


On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

Neural Information Processing Systems

Many tasks in machine learning and signal processing can be solved by minimizing a convex function of a measure. This includes sparse spikes deconvolution or training a neural network with a single hidden layer. For these problems, we study a simple minimization method: the unknown measure is discretized into a mixture of particles and a continuous-time gradient descent is performed on their weights and positions. This is an idealization of the usual way to train neural networks with a large hidden layer. We show that, when initialized correctly and in the many-particle limit, this gradient flow, although non-convex, converges to global minimizers. The proof involves Wasserstein gradient flows, a by-product of optimal transport theory. Numerical experiments show that this asymptotic behavior is already at play for a reasonable number of particles, even in high dimension.







Leveraging the two-timescale regime to demonstrate convergence of neural networks

Neural Information Processing Systems

Artificial neural networks are among the most successful modern machine learning methods, in particular because their non-linear parametrization provides a flexible way to implement feature learning (see, e.g., Goodfellow et al., 2016, chapter 15).