With the increased attention and legislation for data-privacy, collaborative machine learning (ML) algorithms are being developed to ensure the protection of private data used for processing. Federated learning (FL) is the most popular of these methods, which provides privacy preservation by facilitating collaborative training of a shared model without the need to exchange any private data with a centralized server. Rather, an abstraction of the data in the form of a machine learning model update is sent. Recent studies showed that such model updates may still very well leak private information and thus more structured risk assessment is needed. In this paper, we analyze existing vulnerabilities of FL and subsequently perform a literature review of the possible attack methods targetingFL privacy protection capabilities. These attack methods are then categorized by a basic taxonomy. Additionally, we provide a literature study of the most recent defensive strategies and algorithms for FL aimed to overcome these attacks. These defensive strategies are categorized by their respective underlying defence principle. The paper concludes that the application of a single defensive strategy is not enough to provide adequate protection to all available attack methods.
The decentralized nature of federated learning makes detecting and defending against adversarial attacks a challenging task. This paper focuses on backdoor attacks in the federated learning setting, where the goal of the adversary is to reduce the performance of the model on targeted tasks while maintaining good performance on the main task. Unlike existing works, we allow non-malicious clients to have correctly labeled samples from the targeted tasks. We conduct a comprehensive study of backdoor attacks and defenses for the EMNIST dataset, a real-life, user-partitioned, and non-iid dataset. We observe that in the absence of defenses, the performance of the attack largely depends on the fraction of adversaries present and the "complexity'' of the targeted task. Moreover, we show that norm clipping and "weak'' differential privacy mitigate the attacks without hurting the overall performance. We have implemented the attacks and defenses in TensorFlow Federated (TFF), a TensorFlow framework for federated learning. In open-sourcing our code, our goal is to encourage researchers to contribute new attacks and defenses and evaluate them on standard federated datasets.
Federated learning involves training statistical models over remote devices or siloed data centers, such as mobile phones or hospitals, while keeping data localized. Training in heterogeneous and potentially massive networks introduces novel challenges that require a fundamental departure from standard approaches for large-scale machine learning, distributed optimization, and privacy-preserving data analysis. In this article, we discuss the unique characteristics and challenges of federated learning, provide a broad overview of current approaches, and outline several directions of future work that are relevant to a wide range of research communities.
We survey distributed deep learning models for training or inference without accessing raw data from clients. These methods aim to protect confidential patterns in data while still allowing servers to train models. The distributed deep learning methods of federated learning, split learning and large batch stochastic gradient descent are compared in addition to private and secure approaches of differential privacy, homomorphic encryption, oblivious transfer and garbled circuits in the context of neural networks. We study their benefits, limitations and trade-offs with regards to computational resources, data leakage and communication efficiency and also share our anticipated future trends.
With the emergence of data silos and popular privacy awareness, the traditional centralized approach of training artificial intelligence (AI) models is facing strong challenges. Federated learning (FL) has recently emerged as a promising solution under this new reality. Existing FL protocol design has been shown to exhibit vulnerabilities which can be exploited by adversaries both within and without the system to compromise data privacy. It is thus of paramount importance to make FL system designers to be aware of the implications of future FL algorithm design on privacy-preservation. Currently, there is no survey on this topic. In this paper, we bridge this important gap in FL literature. By providing a concise introduction to the concept of FL, and a unique taxonomy covering threat models and two major attacks on FL: 1) poisoning attacks and 2) inference attacks, this paper provides an accessible review of this important topic. We highlight the intuitions, key techniques as well as fundamental assumptions adopted by various attacks, and discuss promising future research directions towards more robust privacy preservation in FL.