An Improved Analysis of Training Over-parameterized Deep Neural Networks

Difan Zou, Quanquan Gu

Neural Information Processing Systems 

Arecent lineofresearch hasshownthatgradient-based algorithms withrandom initialization can converge to the global minima of the training loss for overparameterized (i.e.,sufficiently wide)deepneuralnetworks. However,thecondition onthewidth oftheneural networktoensure theglobal convergence isvery stringent, which is often a high-degree polynomial in the training sample size n (e.g., O(n24)).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found