In the distributed collaborative machine learning (DCML) paradigm, federated learning (FL) recently attracted much attention due to its applications in health, finance, and the latest innovations such as industry 4.0 and smart vehicles. FL provides privacy-by-design. It trains a machine learning model collaboratively over several distributed clients (ranging from two to millions) such as mobile phones, without sharing their raw data with any other participant. In practical scenarios, all clients do not have sufficient computing resources (e.g., Internet of Things), the machine learning model has millions of parameters, and its privacy between the server and the clients while training/testing is a prime concern (e.g., rival parties). In this regard, FL is not sufficient, so split learning (SL) is introduced. SL is reliable in these scenarios as it splits a model into multiple portions, distributes them among clients and server, and trains/tests their respective model portions to accomplish the full model training/testing. In SL, the participants do not share both data and their model portions to any other parties, and usually, a smaller network portion is assigned to the clients where data resides. Recently, a hybrid of FL and SL, called splitfed learning, is introduced to elevate the benefits of both FL (faster training/testing time) and SL (model split and training). Following the developments from FL to SL, and considering the importance of SL, this chapter is designed to provide extensive coverage in SL and its variants. The coverage includes fundamentals, existing findings, integration with privacy measures such as differential privacy, open problems, and code implementation.
In domains where data are sensitive or private, there is great value in methods that can learn in a distributed manner without the data ever leaving the local devices. In light of this need, federated learning has emerged as a popular training paradigm. However, many federated learning approaches trade transmitting data for communicating updated weight parameters for each local device. Therefore, a successful breach that would have otherwise directly compromised the data instead grants whitebox access to the local model, which opens the door to a number of attacks, including exposing the very data federated learning seeks to protect. Additionally, in distributed scenarios, individual client devices commonly exhibit high statistical heterogeneity. Many common federated approaches learn a single global model; while this may do well on average, performance degrades when the i.i.d. assumption is violated, underfitting individuals further from the mean, and raising questions of fairness. To address these issues, we propose Weight Anonymized Factorization for Federated Learning (WAFFLe), an approach that combines the Indian Buffet Process with a shared dictionary of weight factors for neural networks. Experiments on MNIST, FashionMNIST, and CIFAR-10 demonstrate WAFFLe's significant improvement to local test performance and fairness while simultaneously providing an extra layer of security.
In this paper, we propose FedGP, a framework for privacy-preserving data release in the federated learning setting. We use generative adversarial networks, generator components of which are trained by FedAvg algorithm, to draw privacy-preserving artificial data samples and empirically assess the risk of information disclosure. Our experiments show that FedGP is able to generate labelled data of high quality to successfully train and validate supervised models. Finally, we demonstrate that our approach significantly reduces vulnerability of such models to model inversion attacks.
With the increased attention and legislation for data-privacy, collaborative machine learning (ML) algorithms are being developed to ensure the protection of private data used for processing. Federated learning (FL) is the most popular of these methods, which provides privacy preservation by facilitating collaborative training of a shared model without the need to exchange any private data with a centralized server. Rather, an abstraction of the data in the form of a machine learning model update is sent. Recent studies showed that such model updates may still very well leak private information and thus more structured risk assessment is needed. In this paper, we analyze existing vulnerabilities of FL and subsequently perform a literature review of the possible attack methods targetingFL privacy protection capabilities. These attack methods are then categorized by a basic taxonomy. Additionally, we provide a literature study of the most recent defensive strategies and algorithms for FL aimed to overcome these attacks. These defensive strategies are categorized by their respective underlying defence principle. The paper concludes that the application of a single defensive strategy is not enough to provide adequate protection to all available attack methods.
--We consider the problem of reinforcing federated learning with formal privacy guarantees. We propose to employ Bayesian differential privacy, a relaxation of differential privacy for similarly distributed data, to provide sharper privacy loss bounds. We adapt the Bayesian privacy accounting method to the federated setting and suggest multiple improvements for more efficient privacy budgeting at different levels. Our experiments show significant advantage over the state-of-the-art differential privacy bounds for federated learning on image classification tasks, including a medical application, bringing the privacy budget below ε 1 at the client level, and below ε 0 .1 at the instance level. Lower amounts of noise also benefit the model accuracy and reduce the number of communication rounds. I NTRODUCTION The rise of data analytics and machine learning (ML) presents countless opportunities for companies, governments and individuals to benefit from the accumulated data. At the same time, their ability to capture fine levels of detail potentially compromises privacy of data providers. Recent research ,  suggests that even in a black-box setting it is possible to argue about the presence of individual records in the training set or recover certain features of these records. To tackle this problem a number of solutions has been proposed. They vary in how privacy is achieved and to what extent data is protected. One approach that assumes privacy at its core is federated learning (FL) . In the FL setting, a central entity ( server) trains a model on user data without actually copying data from user devices. Instead, users ( clients) update models locally, and the server aggregates these updates. In spite of all the advantages, federated learning does not provide theoretical privacy guarantees, like it is done by differential privacy (DP) , which is viewed by many researchers as the privacy gold standard.