Learning from End User Data with Shuffled Differential Privacy over Kernel Densities

Feb-19-2025–arXiv.org Artificial Intelligence

We study a setting of collecting and learning from private data distributed across end users. In the shuffled model of differential privacy, the end users partially protect their data locally before sharing it, and their data is also anonymized during its collection to enhance privacy. This model has recently become a prominent alternative to central DP, which requires full trust in a central data curator, and local DP, where fully local data protection takes a steep toll on downstream accuracy. Our main technical result is a shuffled DP protocol for privately estimating the kernel density function of a distributed dataset, with accuracy essentially matching central DP . We use it to privately learn a classifier from the end user data, by learning a private density function per class. Moreover, we show that the density function itself can recover the semantic content of its class, despite having been learned in the absence of any unprotected data. Our experiments show the favorable downstream performance of our approach, and highlight key downstream considerations and trade-offs in a practical ML deployment of shuffled DP . Collecting statistics on end user data is commonly required in data analytics and machine learning. As it could leak private user information, privacy guarantees need to be incorporated into the data collection pipeline. Differential Privacy (DP) (Dwork et al., 2006) currently serves as the gold standard for privacy in machine learning. Most of its success has been in the central DP model, where a centralized data curator holds the private data of all the users and is charged with protecting their privacy. However, this model does not address how to collect the data from end users in the first place. The local DP model (Kasiviswanathan et al., 2011), where end users protect the privacy of their data locally before sharing it, is often used for private data collection (Erlingsson et al., 2014; Ding et al., 2017; Apple, 2017). However, compared to central DP, local DP often comes at a steep price of degraded accuracy in downstream uses of the collected data. The shuffled DP model (Bittau et al., 2017; Cheu et al., 2019; Erlingsson et al., 2019) has recently emerged as a prominent intermediate alternative. In this model, the users partially protect their data locally, and then entrust a centralized authority--called the "shuffler"--with the single operation of shuffling (or anonymizing) the data from all participating users.

conference paper, privacy, protocol, (16 more...)

arXiv.org Artificial Intelligence

Feb-19-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Washington > King County
    - Seattle (0.04)
  - California > Santa Barbara County
    - Santa Barbara (0.04)
- Europe
  - Germany > Hesse
    - Darmstadt Region > Darmstadt (0.04)
  - Bulgaria > Pazardzhik Province
    - Pazardzhik (0.04)
- Asia > Middle East
  - Republic of Türkiye > Diyarbakir Province
    - Diyarbakir (0.04)
  - Israel > Tel Aviv District
    - Tel Aviv (0.04)
- Africa > Middle East
  - Morocco > Casablanca-Settat Region
    - Casablanca (0.04)
  - Egypt > Red Sea Governorate
    - Hurghada (0.04)

Genre:
- Research Report (0.64)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology
  - Security & Privacy (1.00)
  - Data Science > Data Mining (1.00)
  - Artificial Intelligence
    - Natural Language (1.00)
    - Machine Learning > Statistical Learning (1.00)
    - Representation & Reasoning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found