Controlling Privacy Loss in Sampling Schemes: an Analysis of Stratified and Cluster Sampling

Bun, Mark, Drechsler, Jörg, Gaboardi, Marco, McMillan, Audra, Sarathy, Jayshree

Jun-21-2023–arXiv.org Artificial Intelligence

Sampling schemes are fundamental tools in statistics, survey design, and algorithm design. For example, they are used in social science research to conduct surveys on a random sample of a target population. They are also used in machine learning to improve the efficiency and accuracy of algorithms on large datasets. In many of these applications, however, the datasets are sensitive and privacy is a concern. Intuition suggests that (sub)sampling a dataset before analysing it provides additional privacy, since it gives individuals plausible deniability about whether their data was included or not. This intuition has been formalized for some types of sampling schemes (such as simple random sampling with and without replacement and Poisson sampling) in a series of papers in the differential privacy literature [23, 34, 11, 32]. Such privacy amplification by subsampling results can provide tight privacy accounting when analysing algorithms that incorporate subsampling, e.g., [33, 1, 21, 28, 19]. However, in practice, sampling designs are often more complex than the simple, data independent sampling schemes that are addressed in prior work.

artificial intelligence, machine learning, privacy amplification, (17 more...)

arXiv.org Artificial Intelligence

Jun-21-2023

arXiv.org PDF

Add feedback

Country:
- Oceania
  - New Zealand > North Island
    - Auckland Region > Auckland (0.04)
  - Australia > New South Wales
    - Sydney (0.04)
- North America
  - United States
    - District of Columbia > Washington (0.04)
    - Maryland (0.04)
    - California > Alameda County
      - Berkeley (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - Alberta > Census Division No. 15
      - Improvement District No. 9 > Banff (0.04)
- Europe
  - Austria > Vienna (0.14)
  - Germany (0.04)
  - Switzerland > Zürich
    - Zürich (0.14)
  - France > Hauts-de-France
    - Nord > Lille (0.04)
- Asia > Japan
  - Kyūshū & Okinawa > Okinawa (0.04)

Genre:
- Research Report (0.64)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology
  - Security & Privacy (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found