approximate differential privacy
Mind the Gap: Mixtures of Gaussians in Approximate Differential Privacy
Liu, Huikang, Selvi, Aras, Wiesemann, Wolfram
We design a class of additive noise mechanisms that satisfy \((\varepsilon, δ)\)-differential privacy (DP) for scalar, real-valued query functions with known sensitivities, with a particular focus on moderate and low-privacy regimes. These mechanisms, which we call \textit{mixture mechanisms}, are constructed by mixing multiple Gaussian distributions that share the same variance but differ in their means and mixture weights. The resulting distributions can be interpreted as convex combinations of a zero-mean Gaussian (as used in the analytic Gaussian mechanism) and additional Gaussians whose means depend on the sensitivity of the query function. We derive tight conditions on the variances required for \((\varepsilon, δ)\)-DP and provide efficient algorithms to compute them. Compared to the analytic Gaussian mechanism, our mechanisms yield substantially lower expected noise amplitudes (\(l_1\)-loss) and variances (\(l_2\)-loss for zero-mean distributions). In the low-privacy regime that motivates our design, our mechanisms approach optimality, mitigating nearly all of the optimality gap of the analytic Gaussian mechanism.
Privately Learning Mixtures of Axis-Aligned Gaussians
We consider the problem of learning mixtures of Gaussians under the constraint of approximate differential privacy. We prove that eO(k2dlog3/2(1/δ)/α2ε) samples are sufficient to learn a mixture of k axis-aligned Gaussians in Rd to within total variation distance αwhile satisfying (ε,δ)-differential privacy. This is the first result for privately learning mixtures of unbounded axis-aligned (or even unbounded univariate) Gaussians. If the covariance matrices of each of the Gaussians is the identity matrix, we show that eO(kd/α2 + kdlog(1/δ)/αε) samples are sufficient. To prove our results, we design a new technique for privately learning mixture distributions. A class of distributions F is said to be list-decodable if there is an algorithm that, given "heavily corrupted" samples from f F, outputs a list of distributions one of which approximates f. We show that if F is privately list-decodable then we can learn mixtures of distributions in F. Finally, we show axis-aligned Gaussian distributions are privately list-decodable, thereby proving mixtures of such distributions are privately learnable.
Minimax Optimal Estimation of Approximate Differential Privacy on Neighboring Databases
Differential privacy has become a widely accepted notion of privacy, leading to the introduction and deployment of numerous privatization mechanisms. However, ensuring the privacy guarantee is an error-prone process, both in designing mechanisms and in implementing those mechanisms. Both types of errors will be greatly reduced, if we have a data-driven approach to verify privacy guarantees, from a black-box access to a mechanism. We pose it as a property estimation problem, and study the fundamental trade-offs involved in the accuracy in estimated privacy guarantees and the number of samples required. We introduce a novel estimator that uses polynomial approximation of a carefully chosen degree to optimally trade-off bias and variance. With n samples, we show that this estimator achieves performance of a straightforward plug-in estimator with n*log(n) samples, a phenomenon referred to as effective sample size amplification. The minimax optimality of the proposed estimator is proved by comparing it to a matching fundamental lower bound.
The Large Margin Mechanism for Differentially Private Maximization
A basic problem in the design of privacy-preserving algorithms is the private maximization problem: the goal is to pick an item from a universe that (approximately) maximizes a data-dependent function, all under the constraint of differential privacy. This problem has been used as a sub-routine in many privacy-preserving algorithms for statistics and machine learning. Previous algorithms for this problem are either range-dependent--i.e., their utility diminishes with the size of the universe--or only apply to very restricted function classes. This work provides the first general purpose, range-independent algorithm for private maximization that guarantees approximate differential privacy. Its applicability is demonstrated on two fundamental tasks in data mining and machine learning.
Reviews: Minimax Optimal Estimation of Approximate Differential Privacy on Neighboring Databases
The authors study the rates of estimating approximate differential privacy (aDP). They do so by reformulating it as a property estimation problem. I find this reduction fairly novel and ties DP to a large body of work on polynomial estimation. In property estimation, it is known that carefully trading off the bias and variance via polynomial approximation, particularly in regions of low probability, allows for obtaining the optimal min max rates. The authors follow the same recipe and show that the min max error scales as Se \epsilon / n \log n.