AITopics | d-sgd

Collaborating Authors

d-sgd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

fef6f971605336724b5e6c0c12dc2534-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 07:12:08 GMT

algorithm, d-distillation, learning, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)

Add feedback

Distributed Distillation for On-Device Learning

Neural Information Processing SystemsAug-17-2025, 11:37:43 GMT

Transmitting model weights requires huge communication overhead and means only devices with identical model architectures can be included.

algorithm, artificial intelligence, machine learning, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)

Add feedback

We thank the reviewers for their detailed and constructive comments, especially during these unprecedented times

Neural Information Processing SystemsAug-17-2025, 11:37:32 GMT

We thank the reviewers for their detailed and constructive comments, especially during these unprecedented times. Our algorithm isn't designed to compete (or However, in our new experiment in Fig. B we achieve close to D-SGD We will add to the paper an experiment with 4 different models. Reference data can be synthetic and then it is easy to obtain (as in co-regularization, see R1's comment). We now explain that in detail. The graphs in this work were randomly drawn for a given maximum number of degrees per node.

artificial intelligence, experiment, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.30)

Add feedback

Faster Convergence with Less Communication: Broadcast-Based Subgraph Sampling for Decentralized Learning over Wireless Networks

Herrera, Daniel Pérez, Chen, Zheng, Larsson, Erik G.

arXiv.org Artificial IntelligenceJan-24-2024

Consensus-based decentralized stochastic gradient descent (D-SGD) is a widely adopted algorithm for decentralized training of machine learning models across networked agents. A crucial part of D-SGD is the consensus-based model averaging, which heavily relies on information exchange and fusion among the nodes. Specifically, for consensus averaging over wireless networks, communication coordination is necessary to determine when and how a node can access the channel and transmit (or receive) information to (or from) its neighbors. In this work, we propose $\texttt{BASS}$, a broadcast-based subgraph sampling method designed to accelerate the convergence of D-SGD while considering the actual communication cost per iteration. $\texttt{BASS}$ creates a set of mixing matrix candidates that represent sparser subgraphs of the base topology. In each consensus iteration, one mixing matrix is sampled, leading to a specific scheduling decision that activates multiple collision-free subsets of nodes. The sampling occurs in a probabilistic manner, and the elements of the mixing matrices, along with their sampling probabilities, are jointly optimized. Simulation results demonstrate that $\texttt{BASS}$ enables faster convergence with fewer transmission slots compared to existing link-based scheduling methods. In conclusion, the inherent broadcasting nature of wireless channels offers intrinsic advantages in accelerating the convergence of decentralized optimization and learning.

iteration, matrix, subset, (16 more...)

arXiv.org Artificial Intelligence

2401.13779

Country:

Europe > Sweden > Östergötland County > Linköping (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)

Add feedback

Decentralized SGD and Average-direction SAM are Asymptotically Equivalent

Zhu, Tongtian, He, Fengxiang, Chen, Kaixuan, Song, Mingli, Tao, Dacheng

arXiv.org Machine LearningNov-9-2023

Decentralized stochastic gradient descent (D-SGD) allows collaborative learning on massive devices simultaneously without the control of a central server. However, existing theories claim that decentralization invariably undermines generalization. In this paper, we challenge the conventional belief and present a completely new perspective for understanding decentralized learning. We prove that D-SGD implicitly minimizes the loss function of an average-direction Sharpness-aware minimization (SAM) algorithm under general non-convex non-$\beta$-smooth settings. This surprising asymptotic equivalence reveals an intrinsic regularization-optimization trade-off and three advantages of decentralization: (1) there exists a free uncertainty evaluation mechanism in D-SGD to improve posterior estimation; (2) D-SGD exhibits a gradient smoothing effect; and (3) the sharpness regularization effect of D-SGD does not decrease as total batch size increases, which justifies the potential generalization benefit of D-SGD over centralized SGD (C-SGD) in large-batch scenarios. The code is available at https://github.com/Raiden-Zhu/ICML-2023-DSGD-and-SAM.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Machine Learning

2306.02913

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

Decentralized Learning over Wireless Networks: The Effect of Broadcast with Random Access

Chen, Zheng, Dahl, Martin, Larsson, Erik G.

arXiv.org Artificial IntelligenceJul-7-2023

In this work, we focus on the communication aspect of decentralized learning, which involves multiple agents training a shared machine learning model using decentralized stochastic gradient descent (D-SGD) over distributed data. In particular, we investigate the impact of broadcast transmission and probabilistic random access policy on the convergence performance of D-SGD, considering the broadcast nature of wireless channels and the link dynamics in the communication topology. Our results demonstrate that optimizing the access probability to maximize the expected number of successful links is a highly effective strategy for accelerating the system convergence.

artificial intelligence, machine learning, probability, (17 more...)

arXiv.org Artificial Intelligence

2305.07368

Country: Europe > Sweden > Östergötland County > Linköping (0.05)

Genre: Research Report > New Finding (0.68)

Industry: Commercial Services & Supplies > Security & Alarm Services (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback

Improved Stability and Generalization Analysis of the Decentralized SGD Algorithm

Bars, Batiste Le, Bellet, Aurélien, Tommasi, Marc

arXiv.org Artificial IntelligenceJun-5-2023

This paper presents a new generalization error analysis for the Decentralized Stochastic Gradient Descent (D-SGD) algorithm based on algorithmic stability. The obtained results largely improve upon state-of-the-art results, and even invalidate their claims that the communication graph has a detrimental effect on generalization. For instance, we show that in convex settings, D-SGD has the same generalization bounds as the classical SGD algorithm, no matter the choice of graph. We exhibit that this counter-intuitive result comes from considering the average of local parameters, which hides a final global averaging step incompatible with the decentralized scenario. In light of this observation, we advocate to analyze the supremum over local parameters and show that in this case, the graph does have an impact on the generalization. Unlike prior results, our analysis yields non-vacuous bounds even for non-connected graphs.

artificial intelligence, generalization error, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2306.02939

Country:

North America > United States > New York > New York County > New York City (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)

Add feedback

Robust Collaborative Learning with Linear Gradient Overhead

Farhadkhani, Sadegh, Guerraoui, Rachid, Gupta, Nirupam, Hoang, Lê Nguyên, Pinot, Rafael, Stephan, John

arXiv.org Artificial IntelligenceJun-3-2023

Collaborative learning algorithms, such as distributed SGD (or D-SGD), are prone to faulty machines that may deviate from their prescribed algorithm because of software or hardware bugs, poisoned data or malicious behaviors. While many solutions have been proposed to enhance the robustness of D-SGD to such machines, previous works either resort to strong assumptions (trusted server, homogeneous data, specific noise model) or impose a gradient computational cost that is several orders of magnitude higher than that of D-SGD. We present MoNNA, a new algorithm that (a) is provably robust under standard assumptions and (b) has a gradient computation overhead that is linear in the fraction of faulty machines, which is conjectured to be tight. Essentially, MoNNA uses Polyak's momentum of local gradients for local updates and nearest-neighbor averaging (NNA) for global mixing, respectively. While MoNNA is rather simple to implement, its analysis has been more challenging and relies on two key elements that may be of independent interest. Specifically, we introduce the mixing criterion of $(\alpha, \lambda)$-reduction to analyze the non-linear mixing of non-faulty machines, and present a way to control the tension between the momentum and the model drifts. We validate our theory by experiments on image classification and make our code available at https://github.com/LPD-EPFL/robust-collaborative-learning.

artificial intelligence, machine learning, nullx, (14 more...)

arXiv.org Artificial Intelligence

2209.10931

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(5 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications > Collaboration (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Filters

Collaborating Authors

d-sgd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

fef6f971605336724b5e6c0c12dc2534-Paper.pdf

Distributed Distillation for On-Device Learning

We thank the reviewers for their detailed and constructive comments, especially during these unprecedented times

61162d94822d468ee6e92803340f2040-Supplemental-Conference.pdf

61162d94822d468ee6e92803340f2040-Paper-Conference.pdf

Faster Convergence with Less Communication: Broadcast-Based Subgraph Sampling for Decentralized Learning over Wireless Networks

Decentralized SGD and Average-direction SAM are Asymptotically Equivalent

Decentralized Learning over Wireless Networks: The Effect of Broadcast with Random Access

Improved Stability and Generalization Analysis of the Decentralized SGD Algorithm

Robust Collaborative Learning with Linear Gradient Overhead