xxt ututt
c82836ed448c41094025b4a872c5341e-Supplemental.pdf
Recently there has been significant theoretical progress on understanding the convergence andgeneralization ofgradient-based methods onnonconvexlosses withoverparameterized models. Nevertheless, manyaspectsofoptimization and generalization and in particular the critical role of small random initialization are not fully understood.