Goto

Collaborating Authors

 aggregation server




Robust Federated Learning against Model Perturbation in Edge Networks

arXiv.org Artificial Intelligence

Federated Learning (FL) is a promising paradigm for realizing edge intelligence, allowing collaborative learning among distributed edge devices by sharing models instead of raw data. However, the shared models are often assumed to be ideal, which would be inevitably violated in practice due to various perturbations, leading to significant performance degradation. To overcome this challenge, we propose a novel method, termed Sharpness-Aware Minimization-based Robust Federated Learning (SMRFL), which aims to improve model robustness against perturbations by exploring the geometrical property of the model landscape. Specifically, SMRFL solves a min-max optimization problem that promotes model convergence towards a flat minimum by minimizing the maximum loss within a neighborhood of the model parameters. In this way, model sensitivity to perturbations is reduced, and robustness is enhanced since models in the neighborhood of the flat minimum also enjoy low loss values. The theoretical result proves that SMRFL can converge at the same rate as FL without perturbations. Extensive experimental results show that SMRFL significantly enhances robustness against perturbations compared to three baseline methods on two real-world datasets under three perturbation scenarios.


Dual Defense: Enhancing Privacy and Mitigating Poisoning Attacks in Federated Learning

arXiv.org Artificial Intelligence

Federated learning (FL) is inherently susceptible to privacy breaches and poisoning attacks. To tackle these challenges, researchers have separately devised secure aggregation mechanisms to protect data privacy and robust aggregation methods that withstand poisoning attacks. However, simultaneously addressing both concerns is challenging; secure aggregation facilitates poisoning attacks as most anomaly detection techniques require access to unencrypted local model updates, which are obscured by secure aggregation. Few recent efforts to simultaneously tackle both challenges offen depend on impractical assumption of non-colluding two-server setups that disrupt FL's topology, or three-party computation which introduces scalability issues, complicating deployment and application. To overcome this dilemma, this paper introduce a Dual Defense Federated learning (DDFed) framework. DDFed simultaneously boosts privacy protection and mitigates poisoning attacks, without introducing new participant roles or disrupting the existing FL topology. DDFed initially leverages cutting-edge fully homomorphic encryption (FHE) to securely aggregate model updates, without the impractical requirement for non-colluding two-server setups and ensures strong privacy protection. Additionally, we proposes a unique two-phase anomaly detection mechanism for encrypted model updates, featuring secure similarity computation and feedback-driven collaborative selection, with additional measures to prevent potential privacy breaches from Byzantine clients incorporated into the detection process. We conducted extensive experiments on various model poisoning attacks and FL scenarios, including both cross-device and cross-silo FL. Experiments on publicly available datasets demonstrate that DDFed successfully protects model privacy and effectively defends against model poisoning threats.


How to Defend Against Large-scale Model Poisoning Attacks in Federated Learning: A Vertical Solution

arXiv.org Artificial Intelligence

Federated learning (FL) is vulnerable to model poisoning attacks due to its distributed nature. The current defenses start from all user gradients (model updates) in each communication round and solve for the optimal aggregation gradients (horizontal solution). This horizontal solution will completely fail when facing large-scale (>50%) model poisoning attacks. In this work, based on the key insight that the convergence process of the model is a highly predictable process, we break away from the traditional horizontal solution of defense and innovatively transform the problem of solving the optimal aggregation gradients into a vertical solution problem. We propose VERT, which uses global communication rounds as the vertical axis, trains a predictor using historical gradients information to predict user gradients, and compares the similarity with actual user gradients to precisely and efficiently select the optimal aggregation gradients. In order to reduce the computational complexity of VERT, we design a low dimensional vector projector to project the user gradients to a computationally acceptable length, and then perform subsequent predictor training and prediction tasks. Exhaustive experiments show that VERT is efficient and scalable, exhibiting excellent large-scale (>=80%) model poisoning defense effects under different FL scenarios. In addition, we can design projector with different structures for different model structures to adapt to aggregation servers with different computing power.


pFedGame -- Decentralized Federated Learning using Game Theory in Dynamic Topology

arXiv.org Machine Learning

Conventional federated learning frameworks suffer from several challenges including performance bottlenecks at the central aggregation server, data bias, poor model convergence, and exposure to model poisoning attacks, and limited trust in the centralized infrastructure. In the current paper, a novel game theory-based approach called pFedGame is proposed for decentralized federated learning, best suitable for temporally dynamic networks. The proposed algorithm works without any centralized server for aggregation and incorporates the problem of vanishing gradients and poor convergence over temporally dynamic topology among federated learning participants. The solution comprises two sequential steps in every federated learning round, for every participant. First, it selects suitable peers for collaboration in federated learning. Secondly, it executes a two-player constant sum cooperative game to reach convergence by applying an optimal federated learning aggregation strategy. Experiments performed to assess the performance of pFedGame in comparison to existing methods in decentralized federated learning have shown promising results with accuracy higher than 70% for heterogeneous data.


TreeCSS: An Efficient Framework for Vertical Federated Learning

arXiv.org Artificial Intelligence

Vertical federated learning (VFL) considers the case that the features of data samples are partitioned over different participants. VFL consists of two main steps, i.e., identify the common data samples for all participants (alignment) and train model using the aligned data samples (training). However, when there are many participants and data samples, both alignment and training become slow. As such, we propose TreeCSS as an efficient VFL framework that accelerates the two main steps. In particular, for sample alignment, we design an efficient multi-party private set intersection (MPSI) protocol called Tree-MPSI, which adopts a tree-based structure and a data-volume-aware scheduling strategy to parallelize alignment among the participants. As model training time scales with the number of data samples, we conduct coreset selection (CSS) to choose some representative data samples for training. Our CCS method adopts a clustering-based scheme for security and generality, which first clusters the features locally on each participant and then merges the local clustering results to select representative samples. In addition, we weight the samples according to their distances to the centroids to reflect their importance to model training. We evaluate the effectiveness and efficiency of our TreeCSS framework on various datasets and models. The results show that compared with vanilla VFL, TreeCSS accelerates training by up to 2.93x and achieves comparable model accuracy.


FedML-HE: An Efficient Homomorphic-Encryption-Based Privacy-Preserving Federated Learning System

arXiv.org Artificial Intelligence

Federated Learning trains machine learning models on distributed devices by aggregating local model updates instead of local data. However, privacy concerns arise as the aggregated local models on the server may reveal sensitive personal information by inversion attacks. Privacy-preserving methods, such as homomorphic encryption (HE), then become necessary for FL training. Despite HE's privacy advantages, its applications suffer from impractical overheads, especially for foundation models. In this paper, we present FedML-HE, the first practical federated learning system with efficient HE-based secure model aggregation. FedML-HE proposes to selectively encrypt sensitive parameters, significantly reducing both computation and communication overheads during training while providing customizable privacy preservation. Our optimized system demonstrates considerable overhead reduction, particularly for large foundation models (e.g., ~10x reduction for ResNet-50, and up to ~40x reduction for BERT), demonstrating the potential for scalable HE-based FL deployment.


Federated Learning Incentive Mechanism under Buyers' Auction Market

arXiv.org Artificial Intelligence

Auction-based Federated Learning (AFL) enables open collaboration among self-interested data consumers and data owners. Existing AFL approaches are commonly under the assumption of sellers' market in that the service clients as sellers are treated as scarce resources so that the aggregation servers as buyers need to compete the bids. Yet, as the technology progresses, an increasing number of qualified clients are now capable of performing federated learning tasks, leading to shift from sellers' market to a buyers' market. In this paper, we shift the angle by adapting the procurement auction framework, aiming to explain the pricing behavior under buyers' market. Our modeling starts with basic setting under complete information, then move further to the scenario where sellers' information are not fully observable. In order to select clients with high reliability and data quality, and to prevent from external attacks, we utilize a blockchain-based reputation mechanism. The experimental results validate the effectiveness of our approach.


Addressing Class Variable Imbalance in Federated Semi-supervised Learning

arXiv.org Artificial Intelligence

Federated Semi-supervised Learning (FSSL) combines techniques from both fields of federated and semi-supervised learning to improve the accuracy and performance of models in a distributed environment by using a small fraction of labeled data and a large amount of unlabeled data. Without the need to centralize all data in one place for training, it collect updates of model training after devices train models at local, and thus can protect the privacy of user data. However, during the federal training process, some of the devices fail to collect enough data for local training, while new devices will be included to the group training. This leads to an unbalanced global data distribution and thus affect the performance of the global model training. Most of the current research is focusing on class imbalance with a fixed number of classes, while little attention is paid to data imbalance with a variable number of classes. Therefore, in this paper, we propose Federated Semi-supervised Learning for Class Variable Imbalance (FCVI) to solve class variable imbalance. The class-variable learning algorithm is used to mitigate the data imbalance due to changes of the number of classes. Our scheme is proved to be significantly better than baseline methods, while maintaining client privacy.