The Implicit Bias of AdaGrad on Separable Data

Qian, Qian, Qian, Xiaoyuan

arXiv.org Machine Learning 

In recent years, implicit regularization from various optimization algorithms plays a crucial role in the generalizatiion abilities in training deep neural networks [Salakhutdinov and Srebro, 2015, Neyshabur et al., 2015, Keskar et al., 2016, Neyshabur et al., 2017, Zhang et al., 2017]. For example, in underdetermined problems where the number of parameters is larger than the number of training examples, many global optimum fail to exhibit good generalization properties, however, a specific optimization algorithm (such as gradient descent) does converge to a particular optimum that generalize well, although no explicit regularization is enforced when training the model.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found