Differentially private low-dimensional representation of high-dimensional data
He, Yiyun, Strohmer, Thomas, Vershynin, Roman, Zhu, Yizhe
–arXiv.org Artificial Intelligence
Differentially private synthetic data provide a powerful mechanism to enable data analysis while protecting sensitive information about individuals. However, when the data lie in a high-dimensional space, the accuracy of the synthetic data suffers from the curse of dimensionality. In this paper, we propose a differentially private algorithm to generate low-dimensional synthetic data efficiently from a high-dimensional dataset with a utility guarantee with respect to the Wasserstein distance. A key step of our algorithm is a private principal component analysis (PCA) procedure with a near-optimal accuracy bound that circumvents the curse of dimensionality. Different from the standard perturbation analysis using the Davis-Kahan theorem, our analysis of private PCA works without assuming the spectral gap for the sample covariance matrix.
arXiv.org Artificial Intelligence
May-25-2023
- Country:
- North America > United States (0.68)
- Genre:
- Research Report (0.50)
- Workflow (0.46)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology:
- Information Technology
- Artificial Intelligence > Machine Learning (1.00)
- Data Science > Data Mining (1.00)
- Security & Privacy (1.00)
- Information Technology