AITopics | asgd

Collaborating Authors

asgd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity

Mahran, Ammar, Maranjyan, Artavazd, Richtárik, Peter

arXiv.org Machine LearningMay-14-2026

Asynchronous stochastic gradient descent (ASGD) is a standard way to exploit heterogeneous compute resources in distributed learning: instead of forcing fast workers to wait for slow ones, the server updates the model whenever a gradient arrives. Vanilla ASGD applies each arriving gradient with the same weight. When local data distributions are heterogeneous, this becomes problematic: faster workers contribute more updates, and we show theoretically that the method is biased toward a frequency-weighted average of the local objectives rather than the desired global objective. Existing remedies typically move away from the simple ASGD template by introducing gathering phases, buffering, or extra memory. We show that this is unnecessary. Keeping the standard ASGD mechanism, we recover the correct objective by rescaling worker-specific stepsizes in proportion to their computation times, so that each worker contributes the same aggregate learning rate over a cycle. In the non-convex setting, under smoothness and bounded heterogeneity assumptions, we prove that the resulting method, Rescaled ASGD, converges to stationary points of the correct global objective in the fixed-computation model. Its time complexity matches the known lower bound in the leading term, while the effects of staleness and data heterogeneity appear only in lower-order terms. Experiments confirm that the method converges to the correct objective and is competitive with state-of-the-art baselines.

artificial intelligence, gradient, machine learning, (17 more...)

arXiv.org Machine Learning

2605.13434

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.68)

Add feedback

Ordered Momentum for Asynchronous SGD

Neural Information Processing SystemsMar-18-2026, 20:01:44 GMT

Distributed learning is essential for training large-scale deep models.Asynchronous SGD (ASGD) and its variants are commonly used distributed learning methods, particularly in scenarios where the computing capabilities of workers in the cluster are heterogeneous.Momentum has been acknowledged for its benefits in both optimization and generalization in deep model training. However, existing works have found that naively incorporating momentum into ASGD can impede the convergence.In this paper, we propose a novel method called ordered momentum (OrMo) for ASGD. In OrMo, momentum is incorporated into ASGD by organizing the gradients in order based on their iteration indexes. We theoretically prove the convergence of OrMo with both constant and delay-adaptive learning rates for non-convex problems. To the best of our knowledge, this is the first work to establish the convergence analysis of ASGD with momentum without dependence on the maximum delay. Empirical results demonstrate that OrMo can achieve better convergence performance compared with ASGD and other asynchronous methods with momentum.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Genre: Research Report (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

0e7e2af2e5ba822c9ad35a37b31b5dd4-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 22:24:19 GMT

generalization error, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Asia > China (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Comprehensive Linear Speedup Analysis for Asynchronous Stochastic Parallel Optimization from Zeroth-Order to First-Order

Neural Information Processing SystemsNov-21-2025, 10:16:55 GMT

Asynchronous parallel optimization received substantial successes and extensive attention recently. One of core theoretical questions is how much speedup (or benefit) the asynchronous parallelization can bring to us. This paper provides a comprehensive and generic analysis to study the speedup property for a broad range of asynchronous parallel stochastic algorithms from the zeroth order to the first order methods. Our result recovers or improves existing analysis on special cases, provides more insights for understanding the asynchronous parallel behaviors, and suggests a novel asynchronous parallel zeroth order method for the first time. Our experiments provide novel applications of the proposed asynchronous parallel zeroth order method on hyper parameter tuning and model blending problems.

algorithm, central node, optimization, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Yolo County > Davis (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Overview (0.48)
Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback

Ordered Momentum for Asynchronous SGD Chang-Wei Shi Yi-Rui Y ang Wu-Jun Li

Neural Information Processing SystemsOct-9-2025, 20:28:38 GMT

Distributed learning is essential for training large-scale deep models.

gradient, ite, momentum, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

0e7e2af2e5ba822c9ad35a37b31b5dd4-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 18:37:00 GMT

asgd, assumption, generalization error, (15 more...)

Neural Information Processing Systems

Country: Asia > China (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Hierarchical Modeling and Architecture Optimization: Review and Unified Framework

Saves, Paul, Hallé-Hannan, Edward, Bussemaker, Jasper, Diouane, Youssef, Bartoli, Nathalie

arXiv.org Machine LearningJul-1-2025

Simulation-based problems involving mixed-variable inputs frequently feature domains that are hierarchical, conditional, heterogeneous, or tree-structured. These characteristics pose challenges for data representation, modeling, and optimization. This paper reviews extensive literature on these structured input spaces and proposes a unified framework that generalizes existing approaches. In this framework, input variables may be continuous, integer, or categorical. A variable is described as meta if its value governs the presence of other decreed variables, enabling the modeling of conditional and hierarchical structures. We further introduce the concept of partially-decreed variables, whose activation depends on contextual conditions. To capture these inter-variable hierarchical relationships, we introduce design space graphs, combining principles from feature modeling and graph theory. This allows the definition of general hierarchical domains suitable for describing complex system architectures. The framework supports the use of surrogate models over such domains and integrates hierarchical kernels and distances for efficient modeling and optimization. The proposed methods are implemented in the open-source Surrogate Modeling Toolbox (SMT 2.0), and their capabilities are demonstrated through applications in Bayesian optimization for complex system design, including a case study in green aircraft architecture.

data mining, machine learning, pattern recognition, (23 more...)

arXiv.org Machine Learning

2506.22621

Country:

Europe > Austria > Vienna (0.14)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
North America > Canada > Quebec > Montreal (0.04)
(5 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Aerospace & Defense (1.00)
Transportation > Air (0.46)

Technology:

Information Technology > Software (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
(6 more...)

Add feedback

Ordered Momentum for Asynchronous SGD

Neural Information Processing SystemsMay-26-2025, 18:08:39 GMT

artificial intelligence, asynchronous sgd, machine learning, (5 more...)

Neural Information Processing Systems

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Optimal Algorithms in Linear Regression under Covariate Shift: On the Importance of Precondition

Liu, Yuanshi, Zhang, Haihan, Chen, Qian, Fang, Cong

arXiv.org Machine LearningFeb-13-2025

A common pursuit in modern statistical learning is to attain satisfactory generalization out of the source data distribution (OOD). In theory, the challenge remains unsolved even under the canonical setting of covariate shift for the linear model. This paper studies the foundational (high-dimensional) linear regression where the ground truth variables are confined to an ellipse-shape constraint and addresses two fundamental questions in this regime: (i) given the target covariate matrix, what is the min-max \emph{optimal} algorithm under covariate shift? (ii) for what kinds of target classes, the commonly-used SGD-type algorithms achieve optimality? Our analysis starts with establishing a tight lower generalization bound via a Bayesian Cramer-Rao inequality. For (i), we prove that the optimal estimator can be simply a certain linear transformation of the best estimator for the source distribution. Given the source and target matrices, we show that the transformation can be efficiently computed via a convex program. The min-max optimal analysis for SGD leverages the idea that we recognize both the accumulated updates of the applied algorithms and the ideal transformation as preconditions on the learning variables. We provide sufficient conditions when SGD with its acceleration variants attain optimality.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

2502.09047

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.61)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Ordered Momentum for Asynchronous SGD

Shi, Chang-Wei, Yang, Yi-Rui, Li, Wu-Jun

arXiv.org Artificial IntelligenceJul-27-2024

Distributed learning is indispensable for training large-scale deep models. Asynchronous SGD (ASGD) and its variants are commonly used distributed learning methods in many scenarios where the computing capabilities of workers in the cluster are heterogeneous. Momentum has been acknowledged for its benefits in both optimization and generalization in deep model training. However, existing works have found that naively incorporating momentum into ASGD can impede the convergence. In this paper, we propose a novel method, called ordered momentum (OrMo), for ASGD. In OrMo, momentum is incorporated into ASGD by organizing the gradients in order based on their iteration indexes. We theoretically prove the convergence of OrMo for non-convex problems. To the best of our knowledge, this is the first work to establish the convergence analysis of ASGD with momentum without relying on the bounded delay assumption. Empirical results demonstrate that OrMo can achieve better convergence performance compared with ASGD and other asynchronous methods with momentum.

gradient, ite, momentum, (13 more...)

arXiv.org Artificial Intelligence

2407.19234

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre:

Research Report > Promising Solution (0.34)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback