AITopics | consensus error

Revisiting Consensus Error: AFine-grained Analysis of Local SGD under Second-order Data Heterogeneity

Neural Information Processing SystemsJun-17-2026, 08:56:25 GMT

Local SGD, or Federated Averaging, is one of the most widely used algorithms for distributed optimization. Although it often outperforms alternatives such as mini-batch SGD, existing theory has not fully explained this advantage under realistic assumptions about data heterogeneity. Recent work has suggested that a second-order heterogeneity assumption may suffice to justify the empirical gains of local SGD. We confirm this conjecture by establishing new upper and lower bounds on the convergence of local SGD. These bounds demonstrate how a low secondorder heterogeneity, combined with third-order smoothness, enables local SGD to interpolate between heterogeneous and homogeneous regimes while maintaining communication efficiency. Our main technical contribution is a refined analysis of the consensus error, a central quantity in such results. We validate our theory with experiments on a distributed linear regression task.

artificial intelligence, machine learning, xmt 1, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.92)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry: Information Technology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

01db36a646c07c64dd39a92b4eceb417-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 07:38:40 GMT

apple 2, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

a5b93aaec935a59987f8a5f2280e7cd7-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-13-2026, 09:40:08 GMT

experiment, geometric analysis, objective function, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.67)

Add feedback

Dynamic Leader-Follower Consensus with Adversaries: A Multi-Hop Relay Approach

Yuan, Liwei, Ishii, Hideaki

arXiv.org Artificial IntelligenceNov-25-2025

Within this area, resilient consensus problems have gained substantial attention across the disciplines of systems control, distributed computing, and robotics (Vaidya et al. (2012); Sundaram and Gharesifard (2018); Yu et al. (2022)). Here, the objective for the nonfaulty, normal agents is to reach consensus despite misbehaviors of adversarial agents. Existing resilient consensus algorithms are designed to ensure that normal agents reach consensus on a value within the convex hull of their initial states, e.g., Yuan and Ishii (2021, 2023); Yu et al. (2022). Meanwhile, numerous formation control and reliable broadcast problems require agents to reach consensus on a predetermined reference value, which may lie inside or outside that convex hull (Bullo et al. This work was supported in part by the National Natural Science Foundation of China under Grant 62403188 and in part by JSPS under Grants-in-Aid for Scientific Research Grant No. 22H01508 and 24K00844. The material in this paper was not presented at any conference.

artificial intelligence, consensus, node, (16 more...)

arXiv.org Artificial Intelligence

2511.19327

Country: Asia > China (0.24)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Communications > Networks (0.93)

Add feedback

9a1de01f893e0d2551ecbb7ce4dc963e-Supplemental.pdf

Neural Information Processing SystemsNov-15-2025, 06:16:33 GMT

dataset, edit distance, sequence, (15 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Implicit Regularization of Decentralized Gradient Descent for Decentralized Sparse Regression

Neural Information Processing SystemsOct-9-2025, 20:21:58 GMT

We consider learning a sparse model from linear measurements taken by a network of agents.

hypothesis, inequality, regularization, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry: Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications > Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

DeMuon: A Decentralized Muon for Matrix Optimization over Graphs

He, Chuan, Ren, Shuyi, Mao, Jingwei, Larsson, Erik G.

arXiv.org Artificial IntelligenceOct-3-2025

In this paper, we propose DeMuon, a method for decentralized matrix optimization over a given communication topology. DeMuon incorporates matrix orthogonalization via Newton-Schulz iterations-a technique inherited from its centralized predecessor, Muon-and employs gradient tracking to mitigate heterogeneity among local functions. Under heavy-tailed noise conditions and additional mild assumptions, we establish the iteration complexity of DeMuon for reaching an approximate stochastic stationary point. This complexity result matches the best-known complexity bounds of centralized algorithms in terms of dependence on the target tolerance. To the best of our knowledge, DeMuon is the first direct extension of Muon to decentralized optimization over graphs with provable complexity guarantees. We conduct preliminary numerical experiments on decentralized transformer pretraining over graphs with varying degrees of connectivity. Our numerical results demonstrate a clear margin of improvement of DeMuon over other popular decentralized algorithms across different network topologies.

artificial intelligence, arxiv preprint arxiv, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2510.01377

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A1: We initially presented only experiments for distributed matrix factorization

Neural Information Processing SystemsAug-19-2025, 22:38:02 GMT

We will incorporate these experiments into the final paper.

artificial intelligence, machine learning, objective function, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.67)

Add feedback

f1ea154c843f7cf3677db7ce922a2d17-Paper.pdf

Neural Information Processing SystemsAug-17-2025, 06:12:29 GMT

artificial intelligence, machine learning, stationary point, (18 more...)

Neural Information Processing Systems

Country: North America (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

A Limitations and future work We believe that the

Neural Information Processing SystemsAug-16-2025, 07:28:59 GMT

All real-world datasets analysed consist of sequence reads of the same part of the genome. This is a widespread set-up for sequence analysis but not ubiquitous. In this project, we work with edit distances between sequences, these are too expensive for large-scale analysis, but it is feasible to produce a large enough training set. We describe here the methods that are most closely related to our work. However, these are bound to a quadratic complexity w.r.t. the length of the input sequence, the best algorithm [ Experiments were also run on synthetic datasets formed by sequences randomly generated.

artificial intelligence, machine learning, sequence, (17 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.50)

Technology: