The Implicit Bias of AdaGrad on Separable Data
In recent years, implicit regularization from various optimization algorithms plays a crucial role in the generalizatiion abilities in training deep neural networks [Salakhutdinov and Srebro, 2015, Neyshabur et al., 2015, Keskar et al., 2016, Neyshabur et al., 2017, Zhang et al., 2017]. For example, in underdetermined problems where the number of parameters is larger than the number of training examples, many global optimum fail to exhibit good generalization properties, however, a specific optimization algorithm (such as gradient descent) does converge to a particular optimum that generalize well, although no explicit regularization is enforced when training the model.
Jun-9-2019
- Country:
- Asia > China
- Liaoning Province > Dalian (0.04)
- North America > United States
- Ohio > Franklin County > Columbus (0.04)
- Asia > China
- Genre:
- Research Report (0.82)
- Technology: