How SGD Selects the Global Minima in Over-parameterized Learning: A Dynamical Stability Perspective

Open in new window