AITopics | baseline optimizer

Collaborating Authors

baseline optimizer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimizing Optimizers for Fast Gradient-Based Learning

Lee, Jaerin, Lee, Kyoung Mu

arXiv.org Machine LearningDec-9-2025

We lay the theoretical foundation for automating optimizer design in gradient-based learning. Based on the greedy principle, we formulate the problem of designing optimizers as maximizing the instantaneous decrease in loss. By treating an optimizer as a function that translates loss gradient signals into parameter motions, the problem reduces to a family of convex optimization problems over the space of optimizers. Solving these problems under various constraints not only recovers a wide range of popular optimizers as closed-form solutions, but also produces the optimal hyperparameters of these optimizers with respect to the problems at hand. This enables a systematic approach to design optimizers and tune their hyperparameters according to the gradient statistics that are collected during the training process. Furthermore, this optimization of optimization can be performed dynamically during training. Just as optimizers train their models by feeding them parameter velocities θ, models can also fit the optimizers to the underlying tasks by feeding gradients g. We are interested in the problem of designing optimiz-ers that maximize the utility of gradient-based learning for a given task. The process of learning manifests as the parameter motion θ driven by the gradient force g applied at each step t. Physics requires a constitutive law that relates kinematic motion to its motive force. In gradient-based learning, optimizers take that role. We can represent an optimizer as a positive semidefinite operator Q 0 that linearly translates the gradients into the parameter updates, θ = Q g. (1) Later sections will reveal that many existing optimizers fall into this category. Q g. (2) Adhering to the greedy paradigm, we turn our original problem of maximizing the utility of learning into a different optimization problem that maximizes this loss drop with respect to the optimizer Q: maximize Problem P1 reveals two design options that bound this maximum: (1) the trust region implied by the feasible set Q Q, and (2) the gradient distribution under the expectation E. Our main focus is on how these two factors determine the optimal optimizer Q Optimizers and their hyperparameters can be dynamically tuned or even be replaced by better ones according to the intermediate probes from the gradients in the middle of training. By reverse engineering commonly used optimizers, we draw the landscape of optimizers that have driven the success of machine learning (Robbins & Monro, 1951; Kingma & Ba, 2015; Loshchilov & Hutter, 2019; Gupta et al., 2018; Martens & Grosse, 2015) into a single picture. This lets us better use the well-studied optimizers in practice and also suggest extensions to them. Note that Σ is a symmetric and positive semidefinite (PSD) matrix of shape d d.

hyperparameter, optimal optimizer, optimizer, (14 more...)

arXiv.org Machine Learning

2512.0637

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

From Rattle to Roar: Optimizer Showdown for MambaStock on S&P 500

Chan, Alena, Garmonina, Maria

arXiv.org Artificial IntelligenceAug-8-2025

We evaluate the performance of several optimizers on the task of forecasting S&P 500 Index returns with the MambaStock model. Among the most widely used algorithms, gradient-smoothing and adaptive-rate optimizers (for example, Adam and RMSProp) yield the lowest test errors. In contrast, the Lion optimizer offers notably faster training. To combine these advantages, we introduce a novel family of optimizers, Roaree, that dampens the oscillatory loss behavior often seen with Lion while preserving its training speed.

artificial intelligence, machine learning, optimizer, (14 more...)

arXiv.org Artificial Intelligence

2508.04707

Country: North America (0.15)

Genre: Research Report (0.50)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Universal Majorization-Minimization Algorithms

Streeter, Matthew

arXiv.org Artificial IntelligenceJul-31-2023

Majorization-minimization (MM) is a family of optimization methods that iteratively reduce a loss by minimizing a locally-tight upper bound, called a majorizer. Traditionally, majorizers were derived by hand, and MM was only applicable to a small number of well-studied problems. We present optimizers that instead derive majorizers automatically, using a recent generalization of Taylor mode automatic differentiation. These universal MM optimizers can be applied to arbitrary problems and converge from any starting point, with no hyperparameter tuning.

adagrad, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2308.0019

Country:

North America > United States > Virginia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.47)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Google Brain Paper Demystifies Learned Optimizers

#artificialintelligenceNov-13-2020, 03:51:30 GMT

Learned optimizers are algorithms that can be trained to solve optimization problems. Although learned optimizers can outperform baseline optimizers in restricted settings, the ML research community understands remarkably little about their inner workings or why they work as well as they do. In a paper currently under review for ICLR 2021, a Google Brain research team attempts to shed some light on the matter. The researchers explain that optimization algorithms can be considered the basis of modern machine learning. A popular research area in recent years has focused on learning optimization algorithms by directly parameterizing and training an optimizer on a distribution of tasks. Research on learned optimizers aims to replace the baseline "hand-designed" optimizers with a parametric optimizer trained on a set of tasks, which can then be applied more generally.

baseline optimizer, brain paper demystify learned optimizer, optimizer, (7 more...)

#artificialintelligence

Country: Asia > China (0.07)

Genre: Research Report > New Finding (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.79)

Add feedback

Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves

Metz, Luke, Maheswaranathan, Niru, Freeman, C. Daniel, Poole, Ben, Sohl-Dickstein, Jascha

arXiv.org Machine LearningSep-23-2020

Much as replacing hand-designed features with learned functions has revolutionized how we solve perceptual tasks, we believe learned algorithms will transform how we train models. In this work we focus on general-purpose learned optimizers capable of training a wide variety of problems with no user-specified hyperparameters. We introduce a new, neural network parameterized, hierarchical optimizer with access to additional features such as validation loss to enable automatic regularization. Most learned optimizers have been trained on only a single task, or a small number of tasks. We train our optimizers on thousands of tasks, making use of orders of magnitude more compute, resulting in optimizers that generalize better to unseen tasks. The learned optimizers not only perform well, but learn behaviors that are distinct from existing first order optimizers. For instance, they generate update steps that have implicit regularization and adapt as the problem hyperparameters (e.g. batch size) or architecture (e.g. neural network width) change. Finally, these learned optimizers show evidence of being useful for out of distribution tasks such as training themselves from scratch.

artificial intelligence, machine learning, optimizer, (17 more...)

arXiv.org Machine Learning

2009.11243

Country:

North America > United States > Texas (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.81)

Industry:

Education (0.46)
Leisure & Entertainment (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback