Goto

Collaborating Authors

 Statistical Learning






ae614c557843b1df326cb29c57225459-Paper.pdf

Neural Information Processing Systems

In this work, we showthat this "lazy training" phenomenon isnot specific tooverparameterized neural networks, and is due to a choice of scaling, often implicit, that makes the model behave as its linearization around the initialization, thus yielding amodel equivalenttolearning withpositive-definite kernels.



Stochastic Chebyshev Gradient Descent for Spectral Optimization

Neural Information Processing Systems

Unfortunately, computing the gradient of a spectral function is generally of cubic complexity, as such gradient descent methods are rather expensive for optimizing objectives involving the spectral function.