In almost all shallow analytic neural network optimization landscapes, efficient minimizers have strongly convex neighborhoods

Apr-15-2025–arXiv.org Artificial Intelligence

Artificial neural networks (ANNs) define parametrized families of f unctions (the realization functions) whose definition is inspired by biological neural networks. Running optimization algorithms on these parametrized families (the t raining of neural networks) has proven to be very efficient in various mach ine learning tasks, including image recognition, natural language processing, a utonomous systems, protein folding, climate modelling. The preferred method for the training of artificial neural networ ks (ANNs) are Stochastic Gradient Descent (SGD) algorithms. The vanilla SGD a lgorithm was first applied in Rumelhart et al. [ 1986 ]. Today, variants such as momentum-based methods [ Polyak, 1964 ], AMSProp [ Hinton, 2012 ] and the Adam optimizer [ Kingma and Ba, 2015 ] are more commonly used. Generally, the efficiency of optimization algorithms is significantly affec ted by the structure of the optimization landscape. The smoothing of u pdates in the momentum approaches seem to help with saddle points and adapt ive methods like RMSProp and Adam seem to adjust learning rates better to n avigate complex landscapes effectively. Mathematically rigorous approaches often assume that the SGD sc heme converges to a (local) minimum with a strongly convex neighborhood ( meaning that the Hessian of the landscape is strictly positive definite) or t hat a Polyak-null Lojasiewicz inequality (in the strong sense with exponent 2) applies.

artificial intelligence, machine learning, redundancy, (18 more...)

arXiv.org Artificial Intelligence

Apr-15-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > New York (0.45)

Genre:
- Research Report (0.63)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (1.00)
  - Statistical Learning > Gradient Descent (0.54)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found