AITopics | easgd

Collaborating Authors

easgd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsFeb-8-2025, 03:24:05 GMT

This paper studies general-purpose training algorithms for deep learning and proposes a family of algorithms called elastic averaging SGD. The idea is novel and the paper is of very high quality. The paper focuses on training large-scale deep learning models under communication constraints. This problem is difficult since there are many local optima in non-convex problems like in deep learning. The optimization problem is formulated as a global variable consensus problem such that local workers would not fall into different local optima, and then its gradient update rules are reinterpreted using the elastic forces between local and global parameters.

algorithm, author feedback and meta-review, easgd, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Dynamic Weighting Strategy to Mitigate Worker Node Failure in Distributed Deep Learning

Xu, Yuesheng, Carr, Arielle

arXiv.org Artificial IntelligenceSep-13-2024

The increasing complexity of deep learning models and the demand for processing vast amounts of data make the utilization of large-scale distributed systems for efficient training essential. These systems, however, face significant challenges such as communication overhead, hardware limitations, and node failure. This paper investigates various optimization techniques in distributed deep learning, including Elastic Averaging SGD (EASGD) and the second-order method AdaHessian. We propose a dynamic weighting strategy to mitigate the problem of straggler nodes due to failure, enhancing the performance and efficiency of the overall training process. We conduct experiments with different numbers of workers and communication periods to demonstrate improved convergence rates and test performance using our strategy.

master node, node, worker node, (12 more...)

arXiv.org Artificial Intelligence

2409.09242

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models

Teng, Yunfei, Gao, Wenbo, Chalus, Francois, Choromanska, Anna, Goldfarb, Donald, Weller, Adrian

arXiv.org Machine LearningMay-24-2019

We consider distributed optimization under communication constraints for training deep learning models. We propose a new algorithm, whose parameter updates rely on two forces: a regular gradient step, and a corrective direction dictated by the currently best-performing worker (leader). Our method differs from the parameter-averaging scheme EASGD in a number of ways: (i) our objective formulation does not change the location of stationary points compared to the original optimization problem; (ii) we avoid convergence decelerations caused by pulling local workers descending to different local minima to each other (i.e. to the average of their parameters); (iii) our update by design breaks the curse of symmetry (the phenomenon of being trapped in poorly generalizing sub-optimal solutions in symmetric non-convex landscapes); and (iv) our approach is more communication efficient since it broadcasts only parameters of the leader rather than all workers. We provide theoretical analysis of the batch version of the proposed algorithm, which we call Leader Gradient Descent (LGD), and its stochastic variant (LSGD). Finally, we implement an asynchronous version of our algorithm and extend it to the multi-leader setting, where we form groups of workers, each represented by its own local leader (the best performer in a group), and update each worker with a corrective direction comprised of two attractive forces: one to the local, and one to the global leader (the best performer among all workers). The multi-leader setting is well-aligned with current hardware architecture, where local workers forming a group lie within a single computational node and different groups correspond to different nodes. For training convolutional neural networks, we empirically demonstrate that our approach compares favorably to state-of-the-art baselines.

experiment, minimizer, stationary point, (16 more...)

arXiv.org Machine Learning

1905.10395

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Elastic Gossip: Distributing Neural Network Training Using Gossip-like Protocols

Pramod, Siddharth

arXiv.org Machine LearningDec-6-2018

Distributing Neural Network training is of particular interest for several reasons including scaling using computing clusters, training at data sources such as IOT devices and edge servers, utilizing underutilized resources across heterogeneous environments, and so on. Most contemporary approaches primarily address scaling using computing clusters and require high network bandwidth and frequent communication. This thesis presents an overview of standard approaches to distribute training and proposes a novel technique involving pairwise-communication using Gossip-like protocols, called Elastic Gossip. This approach builds upon an existing technique known as Elastic Averaging SGD (EASGD), and is similar to another technique called Gossiping SGD which also uses Gossip-like protocols. Elastic Gossip is empirically evaluated against Gossiping SGD using the MNIST digit recognition and CIFAR-10 classification tasks, using commonly used Neural Network architectures spanning Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs). It is found that Elastic Gossip, Gossiping SGD, and All-reduce SGD perform quite comparably, even though the latter entails a substantially higher communication cost. While Elastic Gossip performs better than Gossiping SGD in these experiments, it is possible that a more thorough search over hyper-parameter space, specific to a given application, may yield configurations of Gossiping SGD that work better than Elastic Gossip.

artificial intelligence, elastic gossip, machine learning, (17 more...)

arXiv.org Machine Learning

1812.02407

Country: Europe (0.28)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback