Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks
Lim, Dong-Young, Sabanis, Sotirios
Artificial neural networks (ANNs) are successfully trained when they are finely tuned via the optimization of their associated loss functions. Two aspects of such optimization tasks pose significant challenges, namely the non-convex nature of loss functions and the highly nonlinear features of many types of ANNs. Moreover, the analysis in Lovas et al. [2020] shows that the gradients of such non-convex loss functions typically grow faster than linearly and are only locally Lipschitz continuous. Naturally, stability issues are observed, which are known as the'exploding gradient' phenomenon (Bengio et al. [1994] and Pascanu et al. [2013]), when vanilla stochastic gradient descent (SGDs) or certain types of adaptive algorithms are used for fine tuning. Section 2 provides a simple but transparent example as to why this phenomenon is observed, even when some of the most popular adaptive algorithms are employed. One further notes that occurrences of vanishing gradients are often reported in the ANNs literature (Zhang et al. [2018] and Pascanu et al. [2013]). This phenomenon seems to particularly affect the performance of TUSLA (Lovas et al. [2020]) in our experiments when comparison is made with other popular algorithms such as AdaGrad (Duchi et al. [2011]), RMSProp (Tieleman and Hinton [2012]), ADAM (Kingma and Ba [2015]) and AMSGrad (Reddi et al. [2018]). This is observed despite TUSLA's stability properties which successfully control any potential'exploding gradient' occurrences.
May-28-2021
- Country:
- Europe (0.14)
- North America (0.14)
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Banking & Finance > Insurance (0.46)
- Technology: