Private Mean Estimation of Heavy-Tailed Distributions

Kamath, Gautam, Singhal, Vikrant, Ullman, Jonathan

Feb-21-2020–arXiv.org Machine Learning

Given samples X 1,...,X n from a distribution D, can we estimate the mean of D? This is the problem of mean estimation which is, alongside hypothesis testing, one of the most fundamental questions in statistics. As a result, answers to this problem are known in fairly general settings. For instance, the empirical mean is known to be an optimal estimate of a distribution's true mean under minimal assumptions. That said, statistics like the empirical mean put aside any concerns related to the sensitivity, and might vary significantly based on the addition of a single datapoint in the dataset. While this is not an inherently negative feature, it becomes a problem when the dataset contains personal information, and large shifts based on a single datapoint could potentially violate the corresponding individual's privacy. In order to assuage these concerns, we consider the problem of mean estimation under the constraint of differential privacy (DP) [DMNS06], considered by many to be the gold standard of data privacy. Informally, an algorithm is said to be differentially private if its distribution over outputs is insensitive to the addition or removal of a single datapoint from the dataset. Differential privacy has enjoyed widespread adoption, including deployment in by Apple [Dif17], Google [EPK14], Microsoft [DKY17], and the US Census Bureau for the 2020 Census [DLS 17]. In this vein, a recent line of work [KV18, KLSU19, BKSW19] gives nearly optimal differentially private algorithms for mean estimation of sub-Gaussian random variables.

algorithm, estimation, probability, (16 more...)

arXiv.org Machine Learning

Feb-21-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - District of Columbia > Washington (0.05)
  - New York > New York County
    - New York City (0.05)
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
- Europe
  - Switzerland (0.04)
  - Germany (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.63)

Industry:
- Information Technology > Security & Privacy (1.00)
- Government > Regional Government
  - North America Government > United States Government (0.86)

Technology:
- Information Technology
  - Security & Privacy (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found