fl convergence
Robust Federated Learning in Unreliable Wireless Networks: A Client Selection Approach
Wang, Yanmeng, Ji, Wenkai, Zhou, Jian, Xiao, Fu, Chang, Tsung-Hui
Federated learning (FL) has emerged as a promising distributed learning paradigm for training deep neural networks (DNNs) at the wireless edge, but its performance can be severely hindered by unreliable wireless transmission and inherent data heterogeneity among clients. Existing solutions primarily address these challenges by incorporating wireless resource optimization strategies, often focusing on uplink resource allocation across clients under the assumption of homogeneous client-server network standards. However, these approaches overlooked the fact that mobile clients may connect to the server via diverse network standards (e.g., 4G, 5G, Wi-Fi) with customized configurations, limiting the flexibility of server-side modifications and restricting applicability in real-world commercial networks. This paper presents a novel theoretical analysis about how transmission failures in unreliable networks distort the effective label distributions of local samples, causing deviations from the global data distribution and introducing convergence bias in FL. Our analysis reveals that a carefully designed client selection strategy can mitigate biases induced by network unreliability and data heterogeneity . Motivated by this insight, we propose FedCote, a client selection approach that optimizes client selection probabilities without relying on wireless resource scheduling. Experimental results demonstrate the robustness of FedCote in DNN-based classification tasks under unreliable networks with frequent transmission failures. With rapid advancements in mobile communications and artificial intelligence (AI), edge AI, which leverages locally generated data to train deep neural networks (DNNs) at the wireless edge, has gained significant attention from both academia and industry [1], [2], [3], [4]. A prominent approach in this domain is federated learning (FL), where an edge server coordinates mobile clients in collaboratively training a shared DNN model while ensuring client privacy [5], [6], [7]. However, FL faces a critical challenge due to ubiquitous data heterogeneity across clients, where training data are distributed in a non-i.i.d. and unbalanced manner. If not addressed, data heterogeneity can severely degrade FL performance [8], [9], [10], [11], [12]. Numerous FL algorithms have been proposed to mitigate this issue. For example, FedProx [13] introduced a regularization term in the local objective function to control model divergence, while SCAFFOLD [14] employed control variates to correct local model drift. HFMDS [15] learned essential class-relevant features of real samples to generate an auxiliary synthetic dataset, which was shared among clients for local training, helping to alleviate data heterogeneity .
On The Impact of Client Sampling on Federated Learning Convergence
Fraboni, Yann, Vidal, Richard, Kameni, Laetitia, Lorenzi, Marco
While clients' sampling is a central operation of current state-of-the-art federated learning (FL) approaches, the impact of this procedure on the convergence and speed of FL remains to date under-investigated. In this work we introduce a novel decomposition theorem for the convergence of FL, allowing to clearly quantify the impact of client sampling on the global model update. Contrarily to previous convergence analyses, our theorem provides the exact decomposition of a given convergence step, thus enabling accurate considerations about the role of client sampling and heterogeneity. First, we provide a theoretical ground for previously reported results on the relationship between FL convergence and the variance of the aggregation weights. Second, we prove for the first time that the quality of FL convergence is also impacted by the resulting covariance between aggregation weights. Third, we establish that the sum of the aggregation weights is another source of slow-down and should be equal to 1 to improve FL convergence speed. Our theory is general, and is here applied to Multinomial Distribution (MD) and Uniform sampling, the two default client sampling in FL, and demonstrated through a series of experiments in non-iid and unbalanced scenarios. Our results suggest that MD sampling should be used as default sampling scheme, due to the resilience to the changes in data ratio during the learning process, while Uniform sampling is superior only in the special case when clients have the same amount of data.
Federated Learning in the Sky: Joint Power Allocation and Scheduling with UAV Swarms
Zeng, Tengchan, Semiari, Omid, Mozaffari, Mohammad, Chen, Mingzhe, Saad, Walid, Bennis, Mehdi
Unmanned aerial vehicle (UAV) swarms must exploit machine learning (ML) in order to execute various tasks ranging from coordinated trajectory planning to cooperative target recognition. However, due to the lack of continuous connections between the UAV swarm and ground base stations (BSs), using centralized ML will be challenging, particularly when dealing with a large volume of data. In this paper, a novel framework is proposed to implement distributed federated learning (FL) algorithms within a UAV swarm that consists of a leading UAV and several following UAVs. Each following UAV trains a local FL model based on its collected data and then sends this trained local model to the leading UAV who will aggregate the received models, generate a global FL model, and transmit it to followers over the intra-swarm network. To identify how wireless factors, like fading, transmission delay, and UAV antenna angle deviations resulting from wind and mechanical vibrations, impact the performance of FL, a rigorous convergence analysis for FL is performed. Then, a joint power allocation and scheduling design is proposed to optimize the convergence rate of FL while taking into account the energy consumption during convergence and the delay requirement imposed by the swarm's control system. Simulation results validate the effectiveness of the FL convergence analysis and show that the joint design strategy can reduce the number of communication rounds needed for convergence by as much as 35% compared with the baseline design.