Differentially Private Learning Needs Better Features (or Much More Data)
Machine learning (ML) models have been successfully applied to the analysis of sensitive user data such as medical images (Lundervold & Lundervold, 2019), text messages (Chen et al., 2019) or social media posts (Wu et al., 2016). Training these ML models under the framework of differential privacy (DP) (Dwork et al., 2006b; Chaudhuri et al., 2011; Shokri & Shmatikov, 2015; Abadi et al., 2016) can protect deployed classifiers against unintentional leakage of private training data (Shokri et al., 2017; Song et al., 2017; Carlini et al., 2019). Yet, training deep neural networks with strong DP guarantees comes at a significant cost in utility (Abadi et al., 2016; Yu et al., 2020; Bagdasaryan et al., 2019; Feldman, 2020). In fact, on many ML benchmarks the reported accuracy of private deep learning still falls short of "shallow" (non-private) techniques. For example, on CIFAR-10, Papernot et al. (2020b) train a neural network to 66.2% accuracy for a large DP budget of ε 7.53, the highest accuracy we are aware of for this privacy budget. Yet, without privacy, higher accuracy is achievable with linear models and non-learned "handcrafted" features, e.g., (Coates & Ng, 2012; Oyallon & Mallat, 2015). This leads to the central question of our work: Can differentially private learning benefit from handcrafted features? We answer this question affirmatively by introducing simple and strong handcrafted baselines for differentially private learning, that significantly improve the privacy-utility guarantees on canonical vision benchmarks.
Nov-26-2020