Graph Neural Networks (GNNs) have demonstrated superior performance in learning graph representations for several subsequent downstream inference tasks. However, learning over graph data can raise privacy concerns when nodes represent people or human-related variables that involve personal information about individuals. Previous works have presented various techniques for privacy-preserving deep learning over non-relational data, such as image, audio, video, and text, but there is less work addressing the privacy issues involved in applying deep learning algorithms on graphs. As a result and for the first time, in this paper, we develop a privacy-preserving GNN learning algorithm with formal privacy guarantees based on Local Differential Privacy (LDP) to tackle the problem of node-level privacy, where graph nodes have potentially sensitive features that need to be kept private, but they could be beneficial for an untrusted server to learn richer node representations. Specifically, we propose an optimized LDP algorithm with an unbiased estimator, using which a central server can communicate with the graph nodes to privately collect their data and estimate the graph convolution layer of the GNN. To further reduce the effect of the injected noise, we propose a simple graph convolution layer based on the multi-hop aggregation of the nodes' features. Extensive experiments conducted over real-world datasets demonstrate the capability of our method in maintaining an appropriate privacy-accuracy trade-off for privacy-preserving node classification.
In machine learning, boosting is one of the most popular methods that designed to combine multiple base learners to a superior one. The well-known Boosted Decision Tree classifier, has been widely adopted in many areas. In the big data era, the data held by individual and entities, like personal images, browsing history and census information, are more likely to contain sensitive information. The privacy concern raises when such data leaves the hand of the owners and be further explored or mined. Such privacy issue demands that the machine learning algorithm should be privacy aware. Recently, Local Differential Privacy is proposed as an effective privacy protection approach, which offers a strong guarantee to the data owners, as the data is perturbed before any further usage, and the true values never leave the hands of the owners. Thus the machine learning algorithm with the private data instances is of great value and importance. In this paper, we are interested in developing the privacy-preserving boosting algorithm that a data user is allowed to build a classifier without knowing or deriving the exact value of each data samples. Our experiments demonstrate the effectiveness of the proposed boosting algorithm and the high utility of the learned classifiers.
Ensuring the privacy of sensitive data used to train modern machine learning models is of paramount importance in many areas of practice. One approach to study these concerns is through the lens of differential privacy. In this framework, privacy guarantees are generally obtained by perturbing models in such a way that specifics of data used to train the model are made ambiguous. A particular instance of this approach is through a "teacher-student" framework, wherein the teacher, who owns the sensitive data, provides the student with useful, but noisy, information, hopefully allowing the student model to perform well on a given task without access to particular features of the sensitive data. Because stronger privacy guarantees generally involve more significant perturbation on the part of the teacher, deploying existing frameworks fundamentally involves a trade-off between student's performance and privacy guarantee. One of the most important techniques used in previous works involves an ensemble of teacher models, which return information to a student based on a noisy voting procedure. In this work, we propose a novel voting mechanism with smooth sensitivity, which we call Immutable Noisy ArgMax, that, under certain conditions, can bear very large random noising from the teacher without affecting the useful information transferred to the student. Compared with previous work, our approach improves over the state-of-the-art methods on all measures, and scale to larger tasks with both better performance and stronger privacy ($\epsilon \approx 0$). This new proposed framework can be applied with any machine learning models, and provides an appealing solution for tasks that requires training on a large amount of data.
As increasing amounts of sensitive personal information finds its way into data repositories, it is important to develop analysis mechanisms that can derive aggregate information from these repositories without revealing information about individual data instances. Though the differential privacy model provides a framework to analyze such mechanisms for databases belonging to a single party, this framework has not yet been considered in a multi-party setting. In this paper, we propose a privacy-preserving protocol for composing a differentially private aggregate classifier using classifiers trained locally by separate mutually untrusting parties. The protocol allows these parties to interact with an untrusted curator to construct additive shares of a perturbed aggregate classifier. We also present a detailed theoretical analysis containing a proof of differential privacy of the perturbed aggregate classifier and a bound on the excess risk introduced by the perturbation. We verify the bound with an experimental evaluation on a real dataset.
Recent advances in computing have allowed for the possibility to collect large amounts of data on personal activities and private living spaces. Collecting and publishing a dataset in this environment can cause concerns over privacy of the individuals in the dataset. In this paper we examine these privacy concerns. In particular, given a target application, how can we mask sensitive attributes in the data while preserving the utility of the data in that target application. Our focus is on protecting attributes that are hidden and can be inferred from the data by machine learning algorithms. We propose a generic framework that (1) removes the knowledge useful for inferring sensitive information, but (2) preserves the knowledge relevant to a given target application. We use deep neural networks and generative adversarial networks (GAN) to create privacy-preserving perturbations. Our noise-generating network is compact and efficient for running on mobile devices. Through extensive experiments, we show that our method outperforms conventional methods in effectively hiding the sensitive attributes while guaranteeing high performance for the target application. Our results hold for new neural network architectures, not seen before during training and are suitable for training new classifiers.