On the Local Minima of the Empirical Risk

Chi Jin, Lydia T. Liu, Rong Ge, Michael I. Jordan

Neural Information Processing Systems 

Even for applications with nonconvex nonsmooth losses (such as modern deep networks), the population risk is generally significantly more well-behaved from an optimization point of view than the empirical risk.