[1606.04474] Learning to learn by gradient descent by gradient descent • /r/MachineLearning
One thing, which I'm not sure, is how correct is their comparison. By that I mean that they fix the global learning rate for the "hand designed" algos and choose it by grid search. However, we do know well that in most problems we can start with a larger learning rate an decay it over time after it platoes. The issue of not conisdering that probably the best global learning rate for the whole run, would be one which is very slow, but eventually outperforms faster ones. Nevertheless, this is an interesting work, although I'm still quite skeptical of such optimiziers to generalize well on large models.
Jun-15-2016, 03:10:13 GMT
- Technology: