Goto

Collaborating Authors

 privacy and fairness


Interview with Lea Demelius: Researching differential privacy

AIHub

In this interview series, we're meeting some of the AAAI/SIGAI Doctoral Consortium participants to find out more about their research. The Doctoral Consortium provides an opportunity for a group of PhD students to discuss and explore their research interests and career objectives in an interdisciplinary workshop together with a panel of established researchers. In the latest interview, we hear from Lea Demelius, who is researching differential privacy. I am studying at the University of Technology Graz in Austria. My research focuses on differential privacy, which is widely regarded as the state-of-the-art for protecting privacy in data analysis and machine learning.


Empirical Analysis of Privacy-Fairness-Accuracy Trade-offs in Federated Learning: A Step Towards Responsible AI

arXiv.org Artificial Intelligence

Federated Learning (FL) enables collaborative machine learning while preserving data privacy but struggles to balance privacy preservation (PP) and fairness. Techniques like Differential Privacy (DP), Homomorphic Encryption (HE), and Secure Multi-Party Computation (SMC) protect sensitive data but introduce trade-offs. DP enhances privacy but can disproportionately impact underrepresented groups, while HE and SMC mitigate fairness concerns at the cost of computational overhead. This work explores the privacy-fairness trade-offs in FL under IID (Independent and Identically Distributed) and non-IID data distributions, benchmarking q-FedAvg, q-MAML, and Ditto on diverse datasets. Our findings highlight context-dependent trade-offs and offer guidelines for designing FL systems that uphold responsible AI principles, ensuring fairness, privacy, and equitable real-world applications.


Privacy-Preserving Fair Synthetic Tabular Data

arXiv.org Artificial Intelligence

Sharing of tabular data containing valuable but private information is limited due to legal and ethical issues. Synthetic data could be an alternative solution to this sharing problem, as it is artificially generated by machine learning algorithms and tries to capture the underlying data distribution. However, machine learning models are not free from memorization and may introduce biases, as they rely on training data. Producing synthetic data that preserves privacy and fairness while maintaining utility close to the real data is a challenging task. This research simultaneously addresses both the privacy and fairness aspects of synthetic data, an area not explored by other studies. In this work, we present PF-WGAN, a privacy-preserving, fair synthetic tabular data generator based on the WGAN-GP model. We have modified the original WGAN-GP by adding privacy and fairness constraints forcing it to produce privacy-preserving fair data. This approach will enable the publication of datasets that protect individual's privacy and remain unbiased toward any particular group. We compared the results with three state-of-the-art synthetic data generator models in terms of utility, privacy, and fairness across four different datasets. We found that the proposed model exhibits a more balanced trade-off among utility, privacy, and fairness.


Can Synthetic Data be Fair and Private? A Comparative Study of Synthetic Data Generation and Fairness Algorithms

arXiv.org Artificial Intelligence

The increasing use of machine learning in learning analytics (LA) has raised significant concerns around algorithmic fairness and privacy. Synthetic data has emerged as a dual-purpose tool, enhancing privacy and improving fairness in LA models. However, prior research suggests an inverse relationship between fairness and privacy, making it challenging to optimize both. This study investigates which synthetic data generators can best balance privacy and fairness, and whether pre-processing fairness algorithms, typically applied to real datasets, are effective on synthetic data. Our results highlight that the DEbiasing CAusal Fairness (DECAF) algorithm achieves the best balance between privacy and fairness. However, DECAF suffers in utility, as reflected in its predictive accuracy. Notably, we found that applying pre-processing fairness algorithms to synthetic data improves fairness even more than when applied to real data. These findings suggest that combining synthetic data generation with fairness pre-processing offers a promising approach to creating fairer LA models.


PFGuard: A Generative Framework with Privacy and Fairness Safeguards

arXiv.org Artificial Intelligence

Generative models must ensure both privacy and fairness for Trustworthy AI. While these goals have been pursued separately, recent studies propose to combine existing privacy and fairness techniques to achieve both goals. However, naïvely combining these techniques can be insufficient due to privacy-fairness conflicts, where a sample in a minority group may be amplified for fairness, only to be suppressed for privacy. We demonstrate how these conflicts lead to adverse effects, such as privacy violations and unexpected fairness-utility tradeoffs. To mitigate these risks, we propose PFGuard, a generative framework with privacy and fairness safeguards, which simultaneously addresses privacy, fairness, and utility. By using an ensemble of multiple teacher models, PFGuard balances privacy-fairness conflicts between fair and private training stages and achieves high utility based on ensemble learning. Extensive experiments show that PFGuard successfully generates synthetic data on high-dimensional data while providing both fairness convergence and strict DP guarantees - the first of its kind to our knowledge. Recently, generative models have shown remarkable performance in various applications including vision (Wang et al., 2021b) and language tasks (Brown et al., 2020) - while also raising significant ethical concerns. In particular, privacy and fairness concerns have emerged due to generative models mimicking their training data. On the privacy side, specific training data can be memorized, allowing the reconstruction of sensitive information like facial images (Hilprecht et al., 2019; Sun et al., 2021). On the fairness side, any bias in the training data can be learned, resulting in biased synthetic data and unfair downstream performances across demographic groups (Zhao et al., 2018; Tan et al., 2020). Although privacy and fairness are both essential for generative models, previous research has primarily tackled them separately. Differential Privacy (DP) techniques (Dwork et al., 2014), which provide rigorous privacy guarantees, have been developed for private generative models (Xie et al., 2018; Jordon et al., 2018); various fair training techniques, which remove data bias and generate more balanced synthetic data, have been proposed for fair generative models (Xu et al., 2018; Choi et al., 2020). To achieve both objectives, harnessing these techniques has emerged as a promising direction. For example, Xu et al. (2021) combine a fair pre-processing technique (Celis et al., 2020) with a private generative model (Chanyaswad et al., 2019) to train both fair and private generative models.


A Multivocal Literature Review on Privacy and Fairness in Federated Learning

arXiv.org Artificial Intelligence

Federated Learning presents a way to revolutionize AI applications by eliminating the necessity for data sharing. Yet, research has shown that information can still be extracted during training, making additional privacy-preserving measures such as differential privacy imperative. To implement real-world federated learning applications, fairness, ranging from a fair distribution of performance to non-discriminative behaviour, must be considered. Particularly in high-risk applications (e.g. healthcare), avoiding the repetition of past discriminatory errors is paramount. As recent research has demonstrated an inherent tension between privacy and fairness, we conduct a multivocal literature review to examine the current methods to integrate privacy and fairness in federated learning. Our analyses illustrate that the relationship between privacy and fairness has been neglected, posing a critical risk for real-world applications. We highlight the need to explore the relationship between privacy, fairness, and performance, advocating for the creation of integrated federated learning frameworks.


Linkage on Security, Privacy and Fairness in Federated Learning: New Balances and New Perspectives

arXiv.org Artificial Intelligence

Federated learning is fast becoming a popular paradigm for applications involving mobile devices, banking systems, healthcare, and IoT systems. Hence, over the past five years, researchers have undertaken extensive studies on the privacy leaks, security threats, and fairness associated with these emerging models. For the most part, these three critical concepts have been studied in isolation; however, recent research has revealed that there may be an intricate interplay between them. For instance, some researchers have discovered that pursuing fairness may compromise privacy, or that efforts to enhance security can impact fairness. These emerging insights shed light on the fundamental connections between privacy, security, and fairness within federated learning, and, by delving deeper into these interconnections, we may be able to significantly augment research and development across the field. Consequently, the aim of this survey is to offer comprehensive descriptions of the privacy, security, and fairness issues in federated learning. Moreover, we analyze the complex relationships between these three dimensions of cyber safety and pinpoint the fundamental elements that influence each of them. We contend that there exists a trade-off between privacy and fairness and between security and gradient sharing. On this basis, fairness can function as a bridge between privacy and security to build models that are either more secure or more private. Building upon our observations, we identify the trade-offs between privacy and fairness and between security and fairness within the context of federated learning. The survey then concludes with promising directions for future research in this vanguard field.


Privacy for Fairness: Information Obfuscation for Fair Representation Learning with Local Differential Privacy

arXiv.org Artificial Intelligence

As machine learning (ML) becomes more prevalent in human-centric applications, there is a growing emphasis on algorithmic fairness and privacy protection. While previous research has explored these areas as separate objectives, there is a growing recognition of the complex relationship between privacy and fairness. However, previous works have primarily focused on examining the interplay between privacy and fairness through empirical investigations, with limited attention given to theoretical exploration. This study aims to bridge this gap by introducing a theoretical framework that enables a comprehensive examination of their interrelation. We shall develop and analyze an information bottleneck (IB) based information obfuscation method with local differential privacy (LDP) for fair representation learning. In contrast to many empirical studies on fairness in ML, we show that the incorporation of LDP randomizers during the encoding process can enhance the fairness of the learned representation. Our analysis will demonstrate that the disclosure of sensitive information is constrained by the privacy budget of the LDP randomizer, thereby enabling the optimization process within the IB framework to effectively suppress sensitive information while preserving the desired utility through obfuscation. Based on the proposed method, we further develop a variational representation encoding approach that simultaneously achieves fairness and LDP. Our variational encoding approach offers practical advantages. It is trained using a non-adversarial method and does not require the introduction of any variational prior. Extensive experiments will be presented to validate our theoretical results and demonstrate the ability of our proposed approach to achieve both LDP and fairness while preserving adequate utility.


On the Impact of Multi-dimensional Local Differential Privacy on Fairness

arXiv.org Artificial Intelligence

Data collected about individuals is regularly used to make decisions that impact those same individuals. For example, census statistics have important implications for all aspects of daily life, including the allocation of political power, the distribution of federal funds, and research in economics and social sciences. In banking industries, machine learning (ML) models leverage data to proactively monitor customer behavior, reduce the likelihood of false positives, and prevent fraud. In these settings, there is a tension between the need for accurate systems, in which individuals receive what they deserve, and the need to protect individuals from improper disclosure of their sensitive information. Differential privacy (DP) [23] is now widely recognized as the gold standard for providing formal guarantees on the privacy level achieved by an algorithm. However, central DP can only be used on the assumption of a trustworthy server. Local DP (LDP) [32] is a variant that achieves privacy guarantees for each user locally with no assumptions on third-party servers. In other words, LDP ensures that each user's data is locally obfuscated first on the client-side and then sent to the server-side, thus protecting data from privacy leaks on both the client-side and the server-side. Many Big tech companies have deployed LDP-based algorithms to use in their industrial products (e.g., Google Chrome [24] and Apple iOS [4]).


Automated discovery of trade-off between utility, privacy and fairness in machine learning models

arXiv.org Artificial Intelligence

Machine learning models are deployed as a central component in decision making and policy operations with direct impact on individuals' lives. In order to act ethically and comply with government regulations, these models need to make fair decisions and protect the users' privacy. However, such requirements can come with decrease in models' performance compared to their potentially biased, privacy-leaking counterparts. Thus the trade-off between fairness, privacy and performance of ML models emerges, and practitioners need a way of quantifying this trade-off to enable deployment decisions. In this work we interpret this trade-off as a multi-objective optimization problem, and propose PFairDP, a pipeline that uses Bayesian optimization for discovery of Pareto-optimal points between fairness, privacy and utility of ML models. We show how PFairDP can be used to replicate known results that were achieved through manual constraint setting process. We further demonstrate effectiveness of PFairDP with experiments on multiple models and datasets.