In almost all shallow analytic neural network optimization landscapes, efficient minimizers have strongly convex neighborhoods

Benning, Felix, Dereich, Steffen

arXiv.org Artificial Intelligence 

Artificial neural networks (ANNs) define parametrized families of f unctions (the realization functions) whose definition is inspired by biological neural networks. Running optimization algorithms on these parametrized families (the t raining of neural networks) has proven to be very efficient in various mach ine learning tasks, including image recognition, natural language processing, a utonomous systems, protein folding, climate modelling. The preferred method for the training of artificial neural networ ks (ANNs) are Stochastic Gradient Descent (SGD) algorithms. The vanilla SGD a lgorithm was first applied in Rumelhart et al. [ 1986 ]. Today, variants such as momentum-based methods [ Polyak, 1964 ], AMSProp [ Hinton, 2012 ] and the Adam optimizer [ Kingma and Ba, 2015 ] are more commonly used. Generally, the efficiency of optimization algorithms is significantly affec ted by the structure of the optimization landscape. The smoothing of u pdates in the momentum approaches seem to help with saddle points and adapt ive methods like RMSProp and Adam seem to adjust learning rates better to n avigate complex landscapes effectively. Mathematically rigorous approaches often assume that the SGD sc heme converges to a (local) minimum with a strongly convex neighborhood ( meaning that the Hessian of the landscape is strictly positive definite) or t hat a Polyak-null Lojasiewicz inequality (in the strong sense with exponent 2) applies.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found