exponential graph
74e1ed8b55ea44fd7dbb685c412568a4-Supplemental.pdf
Thisboundisattainedif nisanevennumber, λn/2 isthatdesiredeigenvalue.Basedonthenumerical experiment, we know it ifn is an odd number, this bound cannot be attained. The ring topology is undirected, and is illustrated in Figure 1(a). The star topology is undirected, and is illustrated in Figure 1(b). Its weight matrix is generated according totheMetropolis rule,which issymmetric. The 2D-grid topology is undirected, and is illustrated in Figure 1(c).
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
Exponential Graph is Provably Efficient for Decentralized Deep Training
Decentralized SGD is an emerging training method for deep learning known for its much less (thus faster) communication per iteration, which relaxes the averaging step in parallel SGD to inexact averaging. The less exact the averaging is, however, the more the total iterations the training needs to take. Therefore, the key to making decentralized SGD efficient is to realize nearly-exact averaging using little communication. This requires a skillful choice of communication topology, which is an under-studied topic in decentralized optimization.In this paper, we study so-called exponential graphs where every node is connected to $O(\log(n))$ neighbors and $n$ is the total number of nodes. This work proves such graphs can lead to both fast communication and effective averaging simultaneously. We also discover that a sequence of $\log(n)$ one-peer exponential graphs, in which each node communicates to one single neighbor per iteration, can together achieve exact averaging. This favorable property enables one-peer exponential graph to average as effective as its static counterpart but communicates more efficiently. We apply these exponential graphs in decentralized (momentum) SGD to obtain the state-of-the-art balance between per-iteration communication and iteration complexity among all commonly-used topologies. Experimental results on a variety of tasks and models demonstrate that decentralized (momentum) SGD over exponential graphs promises both fast and high-quality training.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Michigan (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- (3 more...)
A Static Exponential Graph
Illustration of the 6 -node static exponential graph and its associated weight matrix. Transform (DFT) and its connection to circulant matrix, which plays the critical role in the proof. Use the conjugate argument then apply the similar procedure as i = 1 . The shape of the 6-node topologies discussed in Sec. The ring topology is undirected, and is illustrated in Figure 1(a).
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
DICE: Data Influence Cascade in Decentralized Learning
Zhu, Tongtian, Li, Wenhao, Wang, Can, He, Fengxiang
Decentralized learning offers a promising approach to crowdsource data consumptions and computational workloads across geographically distributed compute interconnected through peer-to-peer networks, accommodating the exponentially increasing demands. However, proper incentives are still in absence, considerably discouraging participation. Our vision is that a fair incentive mechanism relies on fair attribution of contributions to participating nodes, which faces non-trivial challenges arising from the localized connections making influence ``cascade'' in a decentralized network. To overcome this, we design the first method to estimate \textbf{D}ata \textbf{I}nfluence \textbf{C}ascad\textbf{E} (DICE) in a decentralized environment. Theoretically, the framework derives tractable approximations of influence cascade over arbitrary neighbor hops, suggesting the influence cascade is determined by an interplay of data, communication topology, and the curvature of loss landscape. DICE also lays the foundations for applications including selecting suitable collaborators and identifying malicious behaviors. Project page is available at https://raiden-zhu.github.io/blog/2025/DICE/.
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- (3 more...)
- Overview (1.00)
- Research Report > Promising Solution (0.34)