Goto

Collaborating Authors

 f-differential privacy


A Statistical Viewpoint on Differential Privacy: Hypothesis Testing, Representation and Blackwell's Theorem

arXiv.org Machine Learning

Differential privacy is widely considered the formal privacy for privacy-preserving data analysis due to its robust and rigorous guarantees, with increasingly broad adoption in public services, academia, and industry. Despite originating in the cryptographic context, in this review paper we argue that, fundamentally, differential privacy can be considered a \textit{pure} statistical concept. By leveraging a theorem due to David Blackwell, our focus is to demonstrate that the definition of differential privacy can be formally motivated from a hypothesis testing perspective, thereby showing that hypothesis testing is not merely convenient but also the right language for reasoning about differential privacy. This insight leads to the definition of $f$-differential privacy, which extends other differential privacy definitions through a representation theorem. We review techniques that render $f$-differential privacy a unified framework for analyzing privacy bounds in data analysis and machine learning. Applications of this differential privacy definition to private deep learning, private convex optimization, shuffled mechanisms, and U.S.~Census data are discussed to highlight the benefits of analyzing privacy bounds under this framework compared to existing alternatives.


Federated $f$-Differential Privacy

arXiv.org Artificial Intelligence

Unlike traditional distributed training approaches that upload all the data to central servers, federated learning performs ondevice training and only some summaries of local data or local models are exchanged among clients. Typically, the clients upload their local models to the server and share the global averaging in a repeated manner. This offers plausible solutions to address the critical data privacy issue: sensitive information about individuals such as typing history, shopping transactions, geographical locations, medical records, would stay localized. Nonetheless, a malicious client who participates in the federated learning might still be able to learn information about the other clients' data through the shared model's weights. This is because it is possible for an adversary to learn about or even identify certain individuals by simply tweaking the input datasets and probing the output of the algorithm [FJR15, SSSS17]. This gives rise to a pressing call for privacy-preserving federated learning algorithms. Accordingly, we urgently need a rigorous and principled framework to enhance data privacy, and to quantitatively answer the important questions: Can another client identify the presence or absence of any individual record in my data in federated learning? Worse, what if all the other clients ally each other to attack my data?