How SGD Selects the Global Minima in Over-parameterized Learning: A Dynamical Stability Perspective

Lei Wu, Chao Ma, Weinan E

Neural Information Processing Systems 

Jastrz ebski et al. [6] suggested that the ratio between the learning rate and the batch