Controlling Privacy Loss in Sampling Schemes: an Analysis of Stratified and Cluster Sampling

Bun, Mark, Drechsler, Jörg, Gaboardi, Marco, McMillan, Audra, Sarathy, Jayshree

arXiv.org Artificial Intelligence 

Sampling schemes are fundamental tools in statistics, survey design, and algorithm design. For example, they are used in social science research to conduct surveys on a random sample of a target population. They are also used in machine learning to improve the efficiency and accuracy of algorithms on large datasets. In many of these applications, however, the datasets are sensitive and privacy is a concern. Intuition suggests that (sub)sampling a dataset before analysing it provides additional privacy, since it gives individuals plausible deniability about whether their data was included or not. This intuition has been formalized for some types of sampling schemes (such as simple random sampling with and without replacement and Poisson sampling) in a series of papers in the differential privacy literature [23, 34, 11, 32]. Such privacy amplification by subsampling results can provide tight privacy accounting when analysing algorithms that incorporate subsampling, e.g., [33, 1, 21, 28, 19]. However, in practice, sampling designs are often more complex than the simple, data independent sampling schemes that are addressed in prior work.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found