Tonga
Topmoumoute Online Natural Gradient Algorithm
Guided by the goal of obtaining an optimization algorithm that is both fast and yielding good generalization, we study the descent direction maximizing the decrease in generalization error or the probability of not increasing generalization error. The surprising result is that from both the Bayesian and frequentist perspectives this can yield the natural gradient direction. Although that direction can be very expensive to compute we develop an efficient, general, online approximation to the natural gradient descent which is suited to large scale problems. We report experimental results showing much faster convergence in computation time and in number of iterations with TONGA (Topmoumoute Online natural Gradient Algorithm) than with stochastic gradient descent, even on very large datasets.
Robot boat maps Pacific underwater volcano
The vessel, developed by the British company Sea-Kit International, is surveying the volcano as part of the second phase of the Tonga Eruption Seabed Mapping Project (TESMaP), led by New Zealand's National Institute of Water and Atmospheric Research (Niwa) and funded by the Nippon Foundation of Japan.
Topmoumoute Online Natural Gradient Algorithm
Roux, Nicolas L., Manzagol, Pierre-antoine, Bengio, Yoshua
Guided by the goal of obtaining an optimization algorithm that is both fast and yielding good generalization, we study the descent direction maximizing the decrease in generalization error or the probability of not increasing generalization error. The surprising result is that from both the Bayesian and frequentist perspectives this can yield the natural gradient direction. Although that direction can be very expensive to compute we develop an efficient, general, online approximation to the natural gradient descent which is suited to large scale problems. We report experimental results showing much faster convergence in computation time and in number of iterations with TONGA (Topmoumoute Online natural Gradient Algorithm) than with stochastic gradient descent, even on very large datasets.