Differentially Private Distribution Release of Gaussian Mixture Models via KL-Divergence Minimization
Liu, Hang, Scaglione, Anna, Peisert, Sean
–arXiv.org Artificial Intelligence
--Gaussian Mixture Models (GMMs) are widely used statistical models for representing multi-modal data distributions, with numerous applications in data mining, pattern recognition, data simulation, and machine learning. However, recent research has shown that releasing GMM parameters poses significant privacy risks, potentially exposing sensitive information about the underlying data. In this paper, we address the challenge of releasing GMM parameters while ensuring differential privacy (DP) guarantees. Specifically, we focus on the privacy protection of mixture weights, component means, and covariance matrices. We propose to use Kullback-Leibler (KL) divergence as a utility metric to assess the accuracy of the released GMM, as it captures the joint impact of noise perturbation on all the model parameters. T o achieve privacy, we introduce a DP mechanism that adds carefully calibrated random perturbations to the GMM parameters. Through theoretical analysis, we quantify the effects of privacy budget allocation and perturbation statistics on the DP guarantee, and derive a tractable expression for evaluating KL divergence. We formulate and solve an optimization problem to minimize the KL divergence between the released and original models, subject to a given ( ϵ, δ) -DP constraint. Extensive experiments on both synthetic and real-world datasets demonstrate that our approach achieves strong privacy guarantees while maintaining high utility. In recent years, the remarkable success of data-driven artificial intelligence (AI) has spurred an increasing demand for the sharing and analysis of large-scale, multi-class, and high-dimensional datasets across a variety of domains, such as healthcare records, consumer transactions, and mobility traces. Organizations have recognized the potential of sharing data statistics to enhance data mining, improve public services, optimize recommendations, and facilitate data simulation [ 1 ]. However, sharing raw data or even their statistics raise significant privacy concerns, especially when sensitive attributes of individuals might be inferred. This research was supported in part by the Director, Cybersecurity, Energy Security, and Emergency Response (CESER) office of the U.S. Department of Energy, via the Privacy-Preserving, Collective Cyberattack Defense of DERs project, under contract DE-AC02-05CH11231.
arXiv.org Artificial Intelligence
Nov-11-2025
- Country:
- North America > United States > California (0.28)
- Genre:
- Research Report (1.00)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Energy (1.00)
- Government
- Technology: