privacy level
Sketched Gaussian Mechanism for Private Federated Learning
Communication cost and privacy are two major considerations in federated learning (FL). For communication cost, gradient compression by sketching the clients' transmitted model updates is often used for reducing per round communication. For privacy, the Gaussian mechanism (GM), which consists of clipping updates and adding Gaussian noise, is commonly used to guarantee client level differential privacy. Existing literature on private FL analyzes privacy of sketching and GM in an isolated manner, illustrating that sketching provides privacy determined by the sketching dimension and that GM has to supply any additional desired privacy. In this paper, we introduce the Sketched Gaussian Mechanism (SGM), which directly combines sketching and the Gaussian mechanism for privacy.
Minimax Risks and Optimal Procedures for Estimation under Functional Local Differential Privacy
Such concerns are shared by politics and industry, leading to the adoption of France's "Loi pour une Rรฉpublique numรฉrique (Law for the Digital Republic)" in October 2016 (Algan et al., 2016), EU's General Data Protection Regulation in May 2018, and the California Consumer Privacy Act (Wang et al., 2022), all of which regulate data protection, collection, and processing.
Power Mechanism: Private Tabular Representation Release for Model Agnostic Consumption
Vepakomma, Praneeth, Ponkshe, Kaustubh
Traditional collaborative learning approaches are based on sharing of model weights between clients and a server. However, there are advantages to resource efficiency through schemes based on sharing of embeddings (activations) created from the data. Several differentially private methods were developed for sharing of weights while such mechanisms do not exist so far for sharing of embeddings. We propose Ours to learn a privacy encoding network in conjunction with a small utility generation network such that the final embeddings generated from it are equipped with formal differential privacy guarantees. These privatized embeddings are then shared with a more powerful server, that learns a post-processing that results in a higher accuracy for machine learning tasks. We show that our co-design of collaborative and private learning results in requiring only one round of privatized communication and lesser compute on the client than traditional methods. The privatized embeddings that we share from the client are agnostic to the type of model (deep learning, random forests or XGBoost) used on the server in order to process these activations to complete a task.
Differentially-private text generation degrades output language quality
Ensuring user privacy by synthesizing data from large language models (LLMs) tuned under differential privacy (DP) has become popular recently. However, the impact of DP fine-tuned LLMs on the quality of the language and the utility of the texts they produce has not been investigated. In this work, we tune five LLMs with three corpora under four levels of privacy and assess the length, the grammatical correctness, and the lexical diversity of the text outputs they produce. We also probe the utility of the synthetic outputs in downstream classification tasks such as book genre recognition based on book descriptions and cause of death recognition based on verbal autopsies. The results indicate that LLMs tuned under stronger privacy constrains produce texts that are shorter by at least 77 %, that are less grammatically correct by at least 9 %, and are less diverse by at least 10 % in bi-gram diversity. Furthermore, the accuracy they reach in downstream classification tasks decreases, which might be detrimental to the usefulness of the generated synthetic data.
An Interactive Framework for Finding the Optimal Trade-off in Differential Privacy
Yang, Yaohong, Rehn, Aki, Katt, Sammie, Honkela, Antti, Kaski, Samuel
Differential privacy (DP) is the standard for privacy-preserving analysis, and introduces a fundamental trade-off between privacy guarantees and model performance. Selecting the optimal balance is a critical challenge that can be framed as a multi-objective optimization (MOO) problem where one first discovers the set of optimal trade-offs (the Pareto front) and then learns a decision-maker's preference over them. While a rich body of work on interactive MOO exists, the standard approach -- modeling the objective functions with generic surrogates and learning preferences from simple pairwise feedback -- is inefficient for DP because it fails to leverage the problem's unique structure: a point on the Pareto front can be generated directly by maximizing accuracy for a fixed privacy level. Motivated by this property, we first derive the shape of the trade-off theoretically, which allows us to model the Pareto front directly and efficiently. To address inefficiency in preference learning, we replace pairwise comparisons with a more informative interaction. In particular, we present the user with hypothetical trade-off curves and ask them to pick their preferred trade-off. Our experiments on differentially private logistic regression and deep transfer learning across six real-world datasets show that our method converges to the optimal privacy-accuracy trade-off with significantly less computational cost and user interaction than baselines.
Strategic Incentivization for Locally Differentially Private Federated Learning
Pagoti, Yashwant Krishna, Sinha, Arunesh, Sural, Shamik
--In Federated Learning (FL), multiple clients jointly train a machine learning model by sharing gradient information, instead of raw data, with a server over multiple rounds. T o address the possibility of information leakage in spite of sharing only the gradients, Local Differential Privacy (LDP) is often used. In LDP, clients add a selective amount of noise to the gradients before sending the same to the server . Although such noise addition protects the privacy of clients, it leads to a degradation in global model accuracy. In this paper, we model this privacy-accuracy trade-off as a game, where the sever incentivizes the clients to add a lower degree of noise for achieving higher accuracy, while the clients attempt to preserve their privacy at the cost of a potential loss in accuracy. A token based incentivization mechanism is introduced in which the quantum of tokens credited to a client in an FL round is a function of the degree of perturbation of its gradients. The client can later access a newly updated global model only after acquiring enough tokens, which are to be deducted from its balance. We identify the players, their actions and payoff, and perform a strategic analysis of the game. Extensive experiments were carried out to study the impact of different parameters. Federated Learning (FL) allows multiple clients to train a model by sharing their local gradients with a central server for training over multiple rounds. To further prevent data leakage through different forms of inference attacks on FL [1], use of Local Differential Privacy (LDP) has been proposed [2]. However, LDP-FL faces a critical challenge in ensuring fair participation while attempting to achieve accuracy of the global model and respecting the privacy concerns of individual clients. The clients tend to contribute differently to the model as their degree of participation varies based on a privacy budget and the perceived value of their contributions. Thus, there are two opposing factors affecting the success of an LDP-FL set up. The goal of the server is to achieve high global model accuracy and hence, would prefer the least possible perturbation of gradients done by the clients. The clients, on the other hand, are more inclined to behave in a way that protects their privacy and tend to add more noise to their gradients. However, if all the clients overly perturb their gradients, eventually the accuracy of the global model will suffer, rendering the LDP-FL process ineffective.