Lin, Yingyu
Purifying Approximate Differential Privacy with Randomized Post-processing
Lin, Yingyu, Wang, Erchi, Ma, Yi-An, Wang, Yu-Xiang
We propose a framework to convert $(\varepsilon, \delta)$-approximate Differential Privacy (DP) mechanisms into $(\varepsilon, 0)$-pure DP mechanisms, a process we call ``purification''. This algorithmic technique leverages randomized post-processing with calibrated noise to eliminate the $\delta$ parameter while preserving utility. By combining the tighter utility bounds and computational efficiency of approximate DP mechanisms with the stronger guarantees of pure DP, our approach achieves the best of both worlds. We illustrate the applicability of this framework in various settings, including Differentially Private Empirical Risk Minimization (DP-ERM), data-dependent DP mechanisms such as Propose-Test-Release (PTR), and query release tasks. To the best of our knowledge, this is the first work to provide a systematic method for transforming approximate DP into pure DP while maintaining competitive accuracy and computational efficiency.
A Skewness-Based Criterion for Addressing Heteroscedastic Noise in Causal Discovery
Lin, Yingyu, Huang, Yuxing, Liu, Wenqin, Deng, Haoran, Ng, Ignavier, Zhang, Kun, Gong, Mingming, Ma, Yi-An, Huang, Biwei
Real-world data often violates the equal-variance assumption (homoscedasticity), making it essential to account for heteroscedastic noise in causal discovery. In this work, we explore heteroscedastic symmetric noise models (HSNMs), where the effect $Y$ is modeled as $Y = f(X) + \sigma(X)N$, with $X$ as the cause and $N$ as independent noise following a symmetric distribution. We introduce a novel criterion for identifying HSNMs based on the skewness of the score (i.e., the gradient of the log density) of the data distribution. This criterion establishes a computationally tractable measurement that is zero in the causal direction but nonzero in the anticausal direction, enabling the causal direction discovery. We extend this skewness-based criterion to the multivariate setting and propose SkewScore, an algorithm that handles heteroscedastic noise without requiring the extraction of exogenous noise. We also conduct a case study on the robustness of SkewScore in a bivariate model with a latent confounder, providing theoretical insights into its performance. Empirical studies further validate the effectiveness of the proposed method.
Tractable MCMC for Private Learning with Pure and Gaussian Differential Privacy
Lin, Yingyu, Ma, Yian, Wang, Yu-Xiang, Redberg, Rachel
Posterior sampling, i.e., exponential mechanism to sample from the posterior distribution, provides $\varepsilon$-pure differential privacy (DP) guarantees and does not suffer from potentially unbounded privacy breach introduced by $(\varepsilon,\delta)$-approximate DP. In practice, however, one needs to apply approximate sampling methods such as Markov chain Monte Carlo (MCMC), thus re-introducing the unappealing $\delta$-approximation error into the privacy guarantees. To bridge this gap, we propose the Approximate SAample Perturbation (abbr. ASAP) algorithm which perturbs an MCMC sample with noise proportional to its Wasserstein-infinity ($W_\infty$) distance from a reference distribution that satisfies pure DP or pure Gaussian DP (i.e., $\delta=0$). We then leverage a Metropolis-Hastings algorithm to generate the sample and prove that the algorithm converges in W$_\infty$ distance. We show that by combining our new techniques with a careful localization step, we obtain the first nearly linear-time algorithm that achieves the optimal rates in the DP-ERM problem with strongly convex and smooth losses.