Unleashing the Power of Randomization in Auditing Differentially Private ML
Pillutla, Krishna, Andrew, Galen, Kairouz, Peter, McMahan, H. Brendan, Oprea, Alina, Oh, Sewoong
–arXiv.org Artificial Intelligence
Differential privacy (DP), introduced in [21], has gained widespread adoption by governments, companies, and researchers by formally ensuring plausible deniability for participating individuals. This is achieved by guaranteeing that a curious observer of the output of a query cannot be confident in their answer to the following binary hypothesis test: did a particular individual participate in the dataset or not? For example, introducing sufficient randomness when training a model on a certain dataset ensures a desired level of differential privacy. This in turn ensures that an individual's sensitive information cannot be inferred from the trained model with high confidence. However, calibrating the right amount of noise can be a challenging process. It is easy to make mistakes when implementing a DP mechanism as it can involve intricacies like micro-batching, sensitivity analysis, and privacy accounting. Even with a correct implementation, there are several known incidents of published DP algorithms with miscalculated privacy guarantees that falsely report higher levels of privacy [16, 33, 39, 46, 56, 57]. Data-driven approaches to auditing a mechanism for a violation of a claimed privacy guarantee can significantly mitigate the danger of unintentionally leaking sensitive data.
arXiv.org Artificial Intelligence
May-28-2023
- Country:
- North America > United States
- New York > New York County
- New York City (0.04)
- California > Santa Barbara County
- Santa Barbara (0.04)
- New York > New York County
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Genre:
- Research Report (0.63)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology:
- Information Technology
- Security & Privacy (1.00)
- Data Science (1.00)
- Artificial Intelligence
- Representation & Reasoning (1.00)
- Machine Learning
- Statistical Learning (1.00)
- Neural Networks (1.00)
- Performance Analysis > Accuracy (0.67)
- Information Technology