Goto

Collaborating Authors

 two-layer neural network






Feature learning via mean-field Langevin dynamics: classifying sparse parities and beyond Taiji Suzuki 1,2, Denny Wu

Neural Information Processing Systems

Langevin dynamics (MFLD) (Mei et al., 2018; Hu et al., 2019) is particularly attractive due to the MFLD arises from a noisy gradient descent update on the parameters, where Gaussian noise is injected to the gradient to encourage "exploration". Furthermore, uniform-in-time estimates of the particle discretization error have also been established (Suzuki et al., The goal of this work is to address the following question.


ae614c557843b1df326cb29c57225459-Paper.pdf

Neural Information Processing Systems

In this work, we showthat this "lazy training" phenomenon isnot specific tooverparameterized neural networks, and is due to a choice of scaling, often implicit, that makes the model behave as its linearization around the initialization, thus yielding amodel equivalenttolearning withpositive-definite kernels.


accordingly to incorporate the comments. Reviewer # 1: (Stepsize and preset T.) Following the current analysis, for a general stepsize η

Neural Information Processing Systems

We appreciate the valuable comments and positive feedback from the reviewers. Without averaging the iterates, no convergence rate is available. In this paper we consider neural network with one hidden layer. In particular, Proposition 4.7 shows that neural TD attains the global minimum of MSBE (without the We will revise the "without loss of generality" claim in the revision. We will clarify this notation in the revision.