Generalization Error Analysis for Attack-Free and Byzantine-Resilient Decentralized Learning with Data Heterogeneity
Ye, Haoxiang, Sun, Tao, Ling, Qing
–arXiv.org Artificial Intelligence
--Decentralized learning, which facilitates joint model training across geographically scattered agents, has gained significant attention in the field of signal and information processing in recent years. While the optimization errors of decentralized learning algorithms have been extensively studied, their generalization errors remain relatively under-explored. As the generalization errors reflect the scalability of trained models on unseen data and are crucial in determining the performance of trained models in real-world applications, understanding the generalization errors of decentralized learning is of paramount importance. In this paper, we present fine-grained generalization error analysis for both attack-free and Byzantine-resilient decentralized learning with heterogeneous data as well as under mild assumptions, in contrast to prior studies that consider homogeneous data and/or rely on a stringent bounded stochastic gradient assumption. Our results shed light on the impact of data heterogeneity, model initialization and stochastic gradient noise - factors that have not been closely investigated before - on the generalization error of decentralized learning. We also reveal that Byzantine attacks performed by malicious agents largely affect the generalization error, and their negative impact is inherently linked to the data heterogeneity while remaining independent on the sample size. Numerical experiments on both convex and non-convex tasks are conducted to validate our theoretical findings. ECENT years have witnessed the significant advance of distributed learning, which enables geographically scattered devices to collaboratively train models, while ensuring the privacy of local data. According to the underlying network topologies, distributed learning can be classified into two categories, federated learning and decentralized learning. Federated learning relies on a central server to coordinate the learning process [2]-[8], while decentralized learning is able to operate autonomously without the need for a central server [9]-[18]. Notably, decentralized learning has gained increasing attention for its capacity to circumvent the communication bottleneck inherent in federated learning, caused by the central server.
arXiv.org Artificial Intelligence
Jun-12-2025