An Improved Analysis of Gradient Tracking for Decentralized Machine Learning
–Neural Information Processing Systems
We consider decentralized machine learning over a network where the training data is distributed across n agents, each of which can compute stochastic model updates on their local data. The agent's common goal is to find a model that minimizes the average of all local loss functions. While gradient tracking (GT) algorithms can overcome a key challenge, namely accounting for differences between workers' local data distributions, the known convergence rates for GT algorithms are not optimal with respect to their dependence on the mixing parameter p (related to the spectral gap of the connectivity matrix).We provide a tighter analysis of the GT method in the stochastic strongly convex, convex and non-convex settings. We improve the dependency on p from \mathcal{O}(p {-2}) to \mathcal{O}(p {-1}c {-1}) in the noiseless case and from \mathcal{O}(p {-3/2}) to \mathcal{O}(p {-1/2}c {-1}) in the general stochastic case, where c \geq p is related to the negative eigenvalues of the connectivity matrix (and is a constant in most practical applications). This improvement was possible due to a new proof technique which could be of independent interest.
Neural Information Processing Systems
May-26-2025, 20:34:26 GMT
- Technology: