DP-RAFT: A Differentially Private Recipe for Accelerated Fine-Tuning

Panda, Ashwinee, Tang, Xinyu, Sehwag, Vikash, Mahloujifar, Saeed, Mittal, Prateek

arXiv.org Artificial Intelligence 

As organizations increasingly use machine learning in real world systems to provide insights on the data generated by real users (Team, 2017), issues of user data privacy have risen to the forefront of existing problems in machine learning. Differential privacy (DP) (Dwork et al., 2006) is the de facto standard for privacy preserving statistics. Common algorithms for privately training machine learning models are differentially private stochastic gradient descent (DP-SGD) (Song et al., 2013; Abadi et al., 2016) and differentially private empirical risk minimization (DP-ERM) (Chaudhuri et al., 2011). While advancements in deep learning can be partially attributed to scaling up the number of model parameters (Kaplan et al., 2020; Brown et al., 2020), as shown by (Kurakin et al., 2022; Tramèr and Boneh, 2020; Yu et al., 2021b; Shen et al., 2021) increasing the number of model parameters in DP-SGD often has an adverse impact on the privacy-utility tradeoff due to the curse of dimensionality present in DP-SGD. Briefly, the "curse" is that the magnitude of the noise added scales with d the square root of the number of parameters, and because the signal does not scale with the number of parameters, the signal to noise ratio (SNR) suffers at scale.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found