Generalization Bounds for Gradient Methods via Discrete and Continuous Prior Xuanyuan Luo

Neural Information Processing Systems 

Proving algorithm-dependent generalization error bounds for gradient-type optimization methods has attracted significant attention recently in learning theory. However, most existing trajectory-based analyses require either restrictive assumptions on the learning rate (e.g., fast decreasing learning rate), or continuous injected noise (such as the Gaussian noise in Langevin dynamics).