Taming neural networks with TUSLA: Non-convex learning via adaptive stochastic gradient Langevin algorithms
Lovas, Attila, Lytras, Iosif, Rásonyi, Miklós, Sabanis, Sotirios
A new generation of stochastic gradient decent algorithms, namely stochastic gradient Langevin dynamics (SGLD), can be efficient in finding global minimizers of possibly complicated, highdimensional landscapes under suitable regularity assumptions for the gradient, see Raginsky et al. (2017), Welling and Teh (2011) and references therein. However, in the specific case of tuning ANNs, or simply neural networks henceforth, problems arise already at the theoretical level. As discussed in Section 4 below in some detail, the functionals to be minimized fail any form of dissipativity which should be a sine qua non for any stable gradient algorithms. Adding a quadratic regularization term cannot always remedy this, in which case one needs to replace it with a higher order penalty term. However, the addition of such a term leads to the violation of the global Lipschitz continuity for the regularized gradient, which in turn renders the use of gradient descent methods problematic as it can be seen in Figure 2. This issue has been highlighted in the case of Euler discretizations (of which SGLD is an example) in Hutzenthaler et al. (2011), where it is proven that the difference of the exact solution of the corresponding stochastic differential equation (SDE) and of the numerical approximation at even a finite time point diverges to infinity in the strong mean square sense. A natural way to address the above issue is to combine higher order regularization with taming techniques to improve the stability of any resulting algorithm. In particular, the use of taming techniques in the construction of stable numerical approximations for nonlinear SDEs has gained substantial attention in recent years and was introduced by Hutzenthaler et al. (2012) and, independently, All the authors were supported by The Alan Turing Institute, London under the EPSRC grant EP/N510129/1. A. L. and M. R. thank for the "Lendület" grant LP 2015-6 of the Hungarian Academy of Sciences.
Jun-25-2020