consensus error
Revisiting Consensus Error: AFine-grained Analysis of Local SGD under Second-order Data Heterogeneity
Local SGD, or Federated Averaging, is one of the most widely used algorithms for distributed optimization. Although it often outperforms alternatives such as mini-batch SGD, existing theory has not fully explained this advantage under realistic assumptions about data heterogeneity. Recent work has suggested that a second-order heterogeneity assumption may suffice to justify the empirical gains of local SGD. We confirm this conjecture by establishing new upper and lower bounds on the convergence of local SGD. These bounds demonstrate how a low secondorder heterogeneity, combined with third-order smoothness, enables local SGD to interpolate between heterogeneous and homogeneous regimes while maintaining communication efficiency. Our main technical contribution is a refined analysis of the consensus error, a central quantity in such results. We validate our theory with experiments on a distributed linear regression task.
Dynamic Leader-Follower Consensus with Adversaries: A Multi-Hop Relay Approach
Within this area, resilient consensus problems have gained substantial attention across the disciplines of systems control, distributed computing, and robotics (Vaidya et al. (2012); Sundaram and Gharesifard (2018); Yu et al. (2022)). Here, the objective for the nonfaulty, normal agents is to reach consensus despite misbehaviors of adversarial agents. Existing resilient consensus algorithms are designed to ensure that normal agents reach consensus on a value within the convex hull of their initial states, e.g., Yuan and Ishii (2021, 2023); Yu et al. (2022). Meanwhile, numerous formation control and reliable broadcast problems require agents to reach consensus on a predetermined reference value, which may lie inside or outside that convex hull (Bullo et al. This work was supported in part by the National Natural Science Foundation of China under Grant 62403188 and in part by JSPS under Grants-in-Aid for Scientific Research Grant No. 22H01508 and 24K00844. The material in this paper was not presented at any conference.
DeMuon: A Decentralized Muon for Matrix Optimization over Graphs
He, Chuan, Ren, Shuyi, Mao, Jingwei, Larsson, Erik G.
In this paper, we propose DeMuon, a method for decentralized matrix optimization over a given communication topology. DeMuon incorporates matrix orthogonalization via Newton-Schulz iterations-a technique inherited from its centralized predecessor, Muon-and employs gradient tracking to mitigate heterogeneity among local functions. Under heavy-tailed noise conditions and additional mild assumptions, we establish the iteration complexity of DeMuon for reaching an approximate stochastic stationary point. This complexity result matches the best-known complexity bounds of centralized algorithms in terms of dependence on the target tolerance. To the best of our knowledge, DeMuon is the first direct extension of Muon to decentralized optimization over graphs with provable complexity guarantees. We conduct preliminary numerical experiments on decentralized transformer pretraining over graphs with varying degrees of connectivity. Our numerical results demonstrate a clear margin of improvement of DeMuon over other popular decentralized algorithms across different network topologies.
A Limitations and future work We believe that the
All real-world datasets analysed consist of sequence reads of the same part of the genome. This is a widespread set-up for sequence analysis but not ubiquitous. In this project, we work with edit distances between sequences, these are too expensive for large-scale analysis, but it is feasible to produce a large enough training set. We describe here the methods that are most closely related to our work. However, these are bound to a quadratic complexity w.r.t. the length of the input sequence, the best algorithm [ Experiments were also run on synthetic datasets formed by sequences randomly generated.