Goto

Collaborating Authors

 bottom model


A Experimental Details

Neural Information Processing Systems

We train a variety of ResNets for comparing representations. We also train a wider ResNet-w2x and narrower ResNet-0.5x For experiments with changing label distribution, we also train the base ResNet-18. We perform the stitching on CIFAR-10. We consider a simple family of 5-layer CNNs, with four Conv-BatchNorm-ReLU-MaxPool layers and a fully-connected output layer following Page [2018].


Backdoor Attack on Vertical Federated Graph Neural Network Learning

Yang, Jirui, Chen, Peng, Lu, Zhihui, Deng, Ruijun, Duan, Qiang, Zeng, Jianping

arXiv.org Artificial Intelligence

Federated Graph Neural Network (FedGNN) is a privacy-preserving machine learning technology that combines federated learning (FL) and graph neural networks (GNNs). It offers a privacy-preserving solution for training GNNs using isolated graph data. Vertical Federated Graph Neural Network (VFGNN) is an important branch of FedGNN, where data features and labels are distributed among participants, and each participant has the same sample space. Due to the difficulty of accessing and modifying distributed data and labels, the vulnerability of VFGNN to backdoor attacks remains largely unexplored. In this context, we propose BVG, the first method for backdoor attacks in VFGNN. Without accessing or modifying labels, BVG uses multi-hop triggers and requires only four target class nodes for an effective backdoor attack. Experiments show that BVG achieves high attack success rates (ASR) across three datasets and three different GNN models, with minimal impact on main task accuracy (MTA). We also evaluate several defense methods, further validating the robustness and effectiveness of BVG. This finding also highlights the need for advanced defense mechanisms to counter sophisticated backdoor attacks in practical VFGNN applications.


LabObf: A Label Protection Scheme for Vertical Federated Learning Through Label Obfuscation

He, Ying, Niu, Mingyang, Hua, Jingyu, Mao, Yunlong, Huang, Xu, Li, Chen, Zhong, Sheng

arXiv.org Artificial Intelligence

Split learning, as one of the most common architectures in vertical federated learning, has gained widespread use in industry due to its privacy-preserving characteristics. In this architecture, the party holding the labels seeks cooperation from other parties to improve model performance due to insufficient feature data. Each of these participants has a self-defined bottom model to learn hidden representations from its own feature data and uploads the embedding vectors to the top model held by the label holder for final predictions. This design allows participants to conduct joint training without directly exchanging data. However, existing research points out that malicious participants may still infer label information from the uploaded embeddings, leading to privacy leakage. In this paper, we first propose an embedding extension attack that manually modifies embeddings to undermine existing defense strategies, which rely on constraining the correlation between the embeddings uploaded by participants and the labels. Subsequently, we propose a new label obfuscation defense strategy, called `LabObf', which randomly maps each original one-hot vector label to multiple numerical soft labels with values intertwined, significantly increasing the difficulty for attackers to infer the labels. We conduct experiments on four different types of datasets, and the results show that LabObf can reduce the attacker's success rate to near random guessing while maintaining an acceptable model accuracy.


Leveraging Label Information for Stealthy Data Stealing in Vertical Federated Learning

Yao, Duanyi, Li, Songze, Gong, Xueluan, Hou, Sizai, Pan, Gaoning

arXiv.org Artificial Intelligence

We develop DMAVFL, a novel attack strategy that evades current detection mechanisms. The key idea is to integrate a discriminator with auxiliary classifier that takes a full advantage of the label information (which was completely ignored in previous attacks): on one hand, label information helps to better characterize embeddings of samples from distinct classes, yielding an improved reconstruction performance; on the other hand, computing malicious gradients with label information better mimics the honest training, making the malicious gradients indistinguishable from the honest ones, and the attack much more stealthy. Our comprehensive experiments demonstrate that DMAVFL significantly outperforms existing attacks, and successfully circumvents SOTA defenses for malicious attacks. Additional ablation studies and evaluations on other defenses further underscore the robustness and effectiveness of DMAVFL.


KDk: A Defense Mechanism Against Label Inference Attacks in Vertical Federated Learning

Arazzi, Marco, Nicolazzo, Serena, Nocera, Antonino

arXiv.org Artificial Intelligence

Vertical Federated Learning (VFL) is a category of Federated Learning in which models are trained collaboratively among parties with vertically partitioned data. Typically, in a VFL scenario, the labels of the samples are kept private from all the parties except for the aggregating server, that is the label owner. Nevertheless, recent works discovered that by exploiting gradient information returned by the server to bottom models, with the knowledge of only a small set of auxiliary labels on a very limited subset of training data points, an adversary can infer the private labels. These attacks are known as label inference attacks in VFL. In our work, we propose a novel framework called KDk, that combines Knowledge Distillation and k-anonymity to provide a defense mechanism against potential label inference attacks in a VFL scenario. Through an exhaustive experimental campaign we demonstrate that by applying our approach, the performance of the analyzed label inference attacks decreases consistently, even by more than 60%, maintaining the accuracy of the whole VFL almost unaltered.


A Experimental Details

Neural Information Processing Systems

A.1 Networks used for comparison A.2 CIFAR-10: ResNets: We train a variety of ResNets for comparing representations. The base ResNet architecture for all our experiments is ResNet-18 [He et al., 2015] adapted to CIFAR-10 dimensions with 64 filters in the first convolutional layer. We also train a wider ResNet-w2x and narrower ResNet-0.5x For the deep ResNet, we train a ResNet-164 [He et al., 2015]. For the experiments with varying number of samples or training epochs, we train the base ResNet-18 with the specified number of samples and epochs.


Vertical Federated Image Segmentation

Mandal, Paul K., Leo, Cole

arXiv.org Artificial Intelligence

With the popularization of AI solutions for image based problems, there has been a growing concern for both data privacy and acquisition. In a large number of cases, information is located on separate data silos and it can be difficult for a developer to consolidate all of it in a fashion that is appropriate for machine learning model development. Alongside this, a portion of these localized data regions may not have access to a labelled ground truth. This indicates that they have the capacity to reach conclusions numerically, but are not able to assign classifications amid a lack of pertinent information. Such a determination is often negligible, especially when attempting to develop image based solutions that often necessitate this capability. With this being the case, we propose an innovative vertical federated learning (VFL) model architecture that can operate under this common set of conditions. This is the first (and currently the only) implementation of a system that can work under the constraints of a VFL environment and perform image segmentation while maintaining nominal accuracies. We achieved this by utilizing an FCN that boasts the ability to operate on federates that lack labelled data and privately share the respective weights with a central server, that of which hosts the necessary features for classification. Tests were conducted on the CamVid dataset in order to determine the impact of heavy feature compression required for the transfer of information between federates, as well as to reach nominal conclusions about the overall performance metrics when working under such constraints.


A Split-and-Privatize Framework for Large Language Model Fine-Tuning

Shen, Xicong, Liu, Yang, Liu, Huiqi, Hong, Jue, Duan, Bing, Huang, Zirui, Mao, Yunlong, Wu, Ye, Wu, Di

arXiv.org Artificial Intelligence

Fine-tuning is a prominent technique to adapt a pre-trained language model to downstream scenarios. In parameter-efficient fine-tuning, only a small subset of modules are trained over the downstream datasets, while leaving the rest of the pre-trained model frozen to save computation resources. In recent years, a popular productization form arises as Model-as-a-Service (MaaS), in which vendors provide abundant pre-trained language models, server resources and core functions, and customers can fine-tune, deploy and invoke their customized model by accessing the one-stop MaaS with their own private dataset. In this paper, we identify the model and data privacy leakage risks in MaaS fine-tuning, and propose a Split-and-Privatize (SAP) framework, which manage to mitigate the privacy issues by adapting the existing split learning architecture. The proposed SAP framework is sufficiently investigated by experiments, and the results indicate that it can enhance the empirical privacy by 62% at the cost of 1% model performance degradation on the Stanford Sentiment Treebank dataset.


MISA: Unveiling the Vulnerabilities in Split Federated Learning

Wan, Wei, Ning, Yuxuan, Hu, Shengshan, Xue, Lulu, Li, Minghui, Zhang, Leo Yu, Jin, Hai

arXiv.org Artificial Intelligence

\textit{Federated learning} (FL) and \textit{split learning} (SL) are prevailing distributed paradigms in recent years. They both enable shared global model training while keeping data localized on users' devices. The former excels in parallel execution capabilities, while the latter enjoys low dependence on edge computing resources and strong privacy protection. \textit{Split federated learning} (SFL) combines the strengths of both FL and SL, making it one of the most popular distributed architectures. Furthermore, a recent study has claimed that SFL exhibits robustness against poisoning attacks, with a fivefold improvement compared to FL in terms of robustness. In this paper, we present a novel poisoning attack known as MISA. It poisons both the top and bottom models, causing a \textbf{\underline{misa}}lignment in the global model, ultimately leading to a drastic accuracy collapse. This attack unveils the vulnerabilities in SFL, challenging the conventional belief that SFL is robust against poisoning attacks. Extensive experiments demonstrate that our proposed MISA poses a significant threat to the availability of SFL, underscoring the imperative for academia and industry to accord this matter due attention.


Input Reconstruction Attack against Vertical Federated Large Language Models

Zheng, Fei

arXiv.org Artificial Intelligence

Recently, large language models (LLMs) have drawn extensive attention from academia and the public, due to the advent of the ChatGPT. While LLMs show their astonishing ability in text generation for various tasks, privacy concerns limit their usage in real-life businesses. More specifically, either the user's inputs (the user sends the query to the model-hosting server) or the model (the user downloads the complete model) itself will be revealed during the usage. Vertical federated learning (VFL) is a promising solution to this kind of problem. It protects both the user's input and the knowledge of the model by splitting the model into a bottom part and a top part, which is maintained by the user and the model provider, respectively. However, in this paper, we demonstrate that in LLMs, VFL fails to protect the user input since it is simple and cheap to reconstruct the input from the intermediate embeddings. Experiments show that even with a commercial GPU, the input sentence can be reconstructed in only one second. We also discuss several possible solutions to enhance the privacy of vertical federated LLMs.