Fan, Rongfei
Semantic Communication in Dynamic Channel Scenarios: Collaborative Optimization of Dual-Pipeline Joint Source-Channel Coding and Personalized Federated Learning
Yan, Xingrun, Zuo, Shiyuan, Lyu, Yifeng, Fan, Rongfei, Hu, Han
With the continuous advancement of wireless communication [3] introduced a attention feature model blended between network technologies and the widespread adoption of various feature extraction modules, enhancing adaptability to random data-intensive applications such as AR/VR multimedia, traditional channels but increasing complexity and latency. In contrast, communication systems are facing significant challenges [4] incorporated traditional modules like demodulation and in supporting massive data transmission. Concurrently, as the quantization into semantic communication, enabling adaptive developing of the sixth-generation (6G) network, the integration CSI optimization. of satellite internet into terrestrial communication systems In modern communication scenarios, network topologies becomes increasingly feasible. However, the satellite-to-ground typically feature multi-user access, where multiple client nodes transmission links are inherently constrained by limitations in connect to a central node, resembling noval network topologies bandwidth and latency. To address these existing and potential such as edge computing and self-organizing networks. However, challenges, deep learning-based joint source-channel coding in practical training and deployment, CSI exhibits continuous (Deep JSCC) has surfaced as a promising approach, serving as and dynamic variations, posing challenges for adaptive joint a method to realize Semantic Communication (SC).
On Theoretical Limits of Learning with Label Differential Privacy
Zhao, Puning, Ma, Chuan, Shen, Li, Wang, Shaowei, Fan, Rongfei
Label differential privacy (DP) is designed for learning problems involving private labels and public features. While various methods have been proposed for learning under label DP, the theoretical limits remain largely unexplored. In this paper, we investigate the fundamental limits of learning with label DP in both local and central models for both classification and regression tasks, characterized by minimax convergence rates. We establish lower bounds by converting each task into a multiple hypothesis testing problem and bounding the test error. Additionally, we develop algorithms that yield matching upper bounds. Our results demonstrate that under label local DP (LDP), the risk has a significantly faster convergence rate than that under full LDP, i.e. protecting both features and labels, indicating the advantages of relaxing the DP definition to focus solely on labels. In contrast, under the label central DP (CDP), the risk is only reduced by a constant factor compared to full DP, indicating that the relaxation of CDP only has limited benefits on the performance.
Learning with User-Level Local Differential Privacy
Zhao, Puning, Shen, Li, Fan, Rongfei, Li, Qingming, Wu, Huiwen, Wu, Jiafei, Liu, Zhe
User-level privacy is important in distributed systems. Previous research primarily focuses on the central model, while the local models have received much less attention. Under the central model, user-level DP is strictly stronger than the item-level one. However, under the local model, the relationship between user-level and item-level LDP becomes more complex, thus the analysis is crucially different. In this paper, we first analyze the mean estimation problem and then apply it to stochastic optimization, classification, and regression. In particular, we propose adaptive strategies to achieve optimal performance at all privacy levels. Moreover, we also obtain information-theoretic lower bounds, which show that the proposed methods are minimax optimal up to logarithmic factors. Unlike the central DP model, where user-level DP always leads to slower convergence, our result shows that under the local model, the convergence rates are nearly the same between user-level and item-level cases for distributions with bounded support. For heavy-tailed distributions, the user-level rate is even faster than the item-level one.
Enhancing Learning with Label Differential Privacy by Vector Approximation
Zhao, Puning, Fan, Rongfei, Wu, Huiwen, Li, Qingming, Wu, Jiafei, Liu, Zhe
Label differential privacy (DP) is a framework that protects the privacy of labels in training datasets, while the feature vectors are public. Existing approaches protect the privacy of labels by flipping them randomly, and then train a model to make the output approximate the privatized label. However, as the number of classes $K$ increases, stronger randomization is needed, thus the performances of these methods become significantly worse. In this paper, we propose a vector approximation approach, which is easy to implement and introduces little additional computational overhead. Instead of flipping each label into a single scalar, our method converts each label into a random vector with $K$ components, whose expectations reflect class conditional probabilities. Intuitively, vector approximation retains more information than scalar labels. A brief theoretical analysis shows that the performance of our method only decays slightly with $K$. Finally, we conduct experiments on both synthesized and real datasets, which validate our theoretical analysis as well as the practical performance of our method.
Byzantine-resilient Federated Learning With Adaptivity to Data Heterogeneity
Zuo, Shiyuan, Yan, Xingrun, Fan, Rongfei, Hu, Han, Shan, Hangguan, Quek, Tony Q. S.
This paper deals with federated learning (FL) in the presence of malicious Byzantine attacks and data heterogeneity. A novel Robust Average Gradient Algorithm (RAGA) is proposed, which leverages the geometric median for aggregation and can freely select the round number for local updating. Different from most existing resilient approaches, which perform convergence analysis based on strongly-convex loss function or homogeneously distributed dataset, we conduct convergence analysis for not only strongly-convex but also non-convex loss function over heterogeneous dataset. According to our theoretical analysis, as long as the fraction of dataset from malicious users is less than half, RAGA can achieve convergence at rate $\mathcal{O}({1}/{T^{2/3- \delta}})$ where $T$ is the iteration number and $\delta \in (0, 2/3)$ for non-convex loss function, and at linear rate for strongly-convex loss function. Moreover, stationary point or global optimal solution is proved to obtainable as data heterogeneity vanishes. Experimental results corroborate the robustness of RAGA to Byzantine attacks and verifies the advantage of RAGA over baselines on convergence performance under various intensity of Byzantine attacks, for heterogeneous dataset.
Dynamic Routing for Integrated Satellite-Terrestrial Networks: A Constrained Multi-Agent Reinforcement Learning Approach
Lyu, Yifeng, Hu, Han, Fan, Rongfei, Liu, Zhi, An, Jianping, Mao, Shiwen
The integrated satellite-terrestrial network (ISTN) system has experienced significant growth, offering seamless communication services in remote areas with limited terrestrial infrastructure. However, designing a routing scheme for ISTN is exceedingly difficult, primarily due to the heightened complexity resulting from the inclusion of additional ground stations, along with the requirement to satisfy various constraints related to satellite service quality. To address these challenges, we study packet routing with ground stations and satellites working jointly to transmit packets, while prioritizing fast communication and meeting energy efficiency and packet loss requirements. Specifically, we formulate the problem of packet routing with constraints as a max-min problem using the Lagrange method. Then we propose a novel constrained Multi-Agent reinforcement learning (MARL) dynamic routing algorithm named CMADR, which efficiently balances objective improvement and constraint satisfaction during the updating of policy and Lagrange multipliers. Finally, we conduct extensive experiments and an ablation study using the OneWeb and Telesat mega-constellations. Results demonstrate that CMADR reduces the packet delay by a minimum of 21% and 15%, while meeting stringent energy consumption and packet loss rate constraints, outperforming several baseline algorithms.
Over-the-Air Computation Aided Federated Learning with the Aggregation of Normalized Gradient
Fan, Rongfei, An, Xuming, Zuo, Shiyuan, Hu, Han
Over-the-air computation is a communication-efficient solution for federated learning (FL). In such a system, iterative procedure is performed: Local gradient of private loss function is updated, amplified and then transmitted by every mobile device; the server receives the aggregated gradient all-at-once, generates and then broadcasts updated model parameters to every mobile device. In terms of amplification factor selection, most related works suppose the local gradient's maximal norm always happens although it actually fluctuates over iterations, which may degrade convergence performance. To circumvent this problem, we propose to turn local gradient to be normalized one before amplifying it. Under our proposed method, when the loss function is smooth, we prove our proposed method can converge to stationary point at sub-linear rate. In case of smooth and strongly convex loss function, we prove our proposed method can achieve minimal training loss at linear rate with any small positive tolerance. Moreover, a tradeoff between convergence rate and the tolerance is discovered. To speedup convergence, problems optimizing system parameters are also formulated for above two cases. Although being non-convex, optimal solution with polynomial complexity of the formulated problems are derived. Experimental results show our proposed method can outperform benchmark methods on convergence performance.
Federated Learning Robust to Byzantine Attacks: Achieving Zero Optimality Gap
Zuo, Shiyuan, Fan, Rongfei, Hu, Han, Zhang, Ning, Gong, Shimin
In this paper, we propose a robust aggregation method for federated learning (FL) that can effectively tackle malicious Byzantine attacks. At each user, model parameter is firstly updated by multiple steps, which is adjustable over iterations, and then pushed to the aggregation center directly. This decreases the number of interactions between the aggregation center and users, allows each user to set training parameter in a flexible way, and reduces computation burden compared with existing works that need to combine multiple historical model parameters. At the aggregation center, geometric median is leveraged to combine the received model parameters from each user. Rigorous proof shows that zero optimality gap is achieved by our proposed method with linear convergence, as long as the fraction of Byzantine attackers is below half. Numerical results verify the effectiveness of our proposed method.
Joint Power Control and Data Size Selection for Over-the-Air Computation Aided Federated Learning
An, Xuming, Fan, Rongfei, Zuo, Shiyuan, Hu, Han, Jiang, Hai, Zhang, Ning
Federated learning (FL) has emerged as an appealing machine learning approach to deal with massive raw data generated at multiple mobile devices, {which needs to aggregate the training model parameter of every mobile device at one base station (BS) iteratively}. For parameter aggregating in FL, over-the-air computation is a spectrum-efficient solution, which allows all mobile devices to transmit their parameter-mapped signals concurrently to a BS. Due to heterogeneous channel fading and noise, there exists difference between the BS's received signal and its desired signal, measured as the mean-squared error (MSE). To minimize the MSE, we propose to jointly optimize the signal amplification factors at the BS and the mobile devices as well as the data size (the number of data samples involved in local training) at every mobile device. The formulated problem is challenging to solve due to its non-convexity. To find the optimal solution, with some simplification on cost function and variable replacement, which still preserves equivalence, we transform the changed problem to be a bi-level problem equivalently. For the lower-level problem, optimal solution is found by enumerating every candidate solution from the Karush-Kuhn-Tucker (KKT) condition. For the upper-level problem, the optimal solution is found by exploring its piecewise convexity. Numerical results show that our proposed method can greatly reduce the MSE and can help to improve the training performance of FL compared with benchmark methods.