Higher Order Generalization Error for First Order Discretization of Langevin Diffusion
Li, Mufan Bill, Gazeau, Maxime
We propose a novel approach to analyze generalization error for discretizations of Langevin diffusion, such as the stochastic gradient Langevin dynamics (SGLD). For an $\epsilon$ tolerance of expected generalization error, it is known that a first order discretization can reach this target if we run $\Omega(\epsilon^{-1} \log (\epsilon^{-1}) )$ iterations with $\Omega(\epsilon^{-1})$ samples. In this article, we show that with additional smoothness assumptions, even first order methods can achieve arbitrarily runtime complexity. More precisely, for each $N>0$, we provide a sufficient smoothness condition on the loss function such that a first order discretization can reach $\epsilon$ expected generalization error given $\Omega( \epsilon^{-1/N} \log (\epsilon^{-1}) )$ iterations with $\Omega(\epsilon^{-1})$ samples.
Feb-11-2021
- Country:
- North America
- United States
- New York (0.04)
- Illinois > Champaign County
- Champaign (0.04)
- Canada > Ontario
- Toronto (0.14)
- United States
- Europe > Netherlands
- North Holland > Amsterdam (0.04)
- Asia > Middle East
- Jordan (0.04)
- North America
- Genre:
- Research Report (0.83)