On Memorization and Privacy Risks of Sharpness Aware Minimization
Kim, Young In, Agrawal, Pratiksha, Royset, Johannes O., Khanna, Rajiv
–arXiv.org Artificial Intelligence
In many recent works, there is an increased focus on designing algorithms that seek flatter optima for neural network loss optimization as there is empirical evidence that it leads to better generalization performance in many datasets. We define a new metric that helps us identify which data points specifically do algorithms seeking flatter optima do better when compared to vanilla SGD. We find that the generalization gains achieved by Sharpness Aware Minimization (SAM) are particularly pronounced for atypical data points, which necessitate memorization. This insight helps us unearth higher privacy risks associated with SAM, which we verify through exhaustive empirical evaluations. Finally, we propose mitigation strategies to achieve a more desirable accuracy vs privacy tradeoff. There have been considerable amount of recent works that explore loss optimization that searches for flatter optima (Norton & Royset, 2021; Foret et al., 2020; Wu et al., 2020; Kim et al., 2022; Du et al., 2022; Kwon et al., 2021). Flatness here measures how similar the loss value is for weight perturbations of certain degree around the optima. Significant empirical evidence has demonstrated that methods exploiting flatter optima tend to enjoy better generalization performance. While there have been works on explaining this improvement, these studies look at test accuracy as a monolith, and do not scrutinize on which specific test data points these performance gains come from, and what characterizes these points. In this work, our goal is to bridge this gap through the concept of memorization. Overparamterized neural networks are powerful models capable of achieving close to zero training loss for many datasets. A key insight for this behavior stems from distinguishing'learning' from'memorization' (Feldman, 2020; Feldman & Zhang, 2020). Learning here refers to the classical process of compressing the training data into a model that is further used for predictive downstream task.
arXiv.org Artificial Intelligence
Jan-3-2024
- Country:
- North America > United States (0.29)
- Genre:
- Research Report (1.00)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: