Private Mean Estimation of Heavy-Tailed Distributions

Kamath, Gautam, Singhal, Vikrant, Ullman, Jonathan

arXiv.org Machine Learning 

Given samples X 1,...,X n from a distribution D, can we estimate the mean of D? This is the problem of mean estimation which is, alongside hypothesis testing, one of the most fundamental questions in statistics. As a result, answers to this problem are known in fairly general settings. For instance, the empirical mean is known to be an optimal estimate of a distribution's true mean under minimal assumptions. That said, statistics like the empirical mean put aside any concerns related to the sensitivity, and might vary significantly based on the addition of a single datapoint in the dataset. While this is not an inherently negative feature, it becomes a problem when the dataset contains personal information, and large shifts based on a single datapoint could potentially violate the corresponding individual's privacy. In order to assuage these concerns, we consider the problem of mean estimation under the constraint of differential privacy (DP) [DMNS06], considered by many to be the gold standard of data privacy. Informally, an algorithm is said to be differentially private if its distribution over outputs is insensitive to the addition or removal of a single datapoint from the dataset. Differential privacy has enjoyed widespread adoption, including deployment in by Apple [Dif17], Google [EPK14], Microsoft [DKY17], and the US Census Bureau for the 2020 Census [DLS 17]. In this vein, a recent line of work [KV18, KLSU19, BKSW19] gives nearly optimal differentially private algorithms for mean estimation of sub-Gaussian random variables.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found