Be Careful What You Smooth For: Label Smoothing Can Be a Privacy Shield but Also a Catalyst for Model Inversion Attacks
Struppek, Lukas, Hintersdorf, Dominik, Kersting, Kristian
–arXiv.org Artificial Intelligence
Label smoothing - using softened labels instead of hard ones - is a widely adopted regularization method for deep learning, showing diverse benefits such as enhanced generalization and calibration. Its implications for preserving model privacy, however, have remained unexplored. To fill this gap, we investigate the impact of label smoothing on model inversion attacks (MIAs), which aim to generate class-representative samples by exploiting the knowledge encoded in a classifier, thereby inferring sensitive information about its training data. Through extensive analyses, we uncover that traditional label smoothing fosters MIAs, thereby increasing a model's privacy leakage. Even more, we reveal that smoothing with negative factors counters this trend, impeding the extraction of class-related information and leading to privacy preservation, beating state-of-the-art defenses. This establishes a practical and powerful novel way for enhancing model resilience against MIAs. Deep learning classifiers continue to achieve remarkable performance across a wide spectrum of domains (Radford et al., 2021; Ramesh et al., 2022; OpenAI, 2023), due in part to powerful regularization techniques. The common Label Smoothing (LS) regularization (Szegedy et al., 2016) replaces labels with a smoothed version by mixing the hard labels with a uniform distribution to improve generalization and model calibration (Pereyra et al., 2017; Müller et al., 2019). However, the very capabilities that make these models astonishing also render them susceptible to privacy attacks, potentially resulting in the leakage of sensitive information about their training data. One category of privacy breaches arises from model inversion attacks (MIAs) (Fredrikson et al., 2015), a class of attacks designed to extract characteristic visual features from a trained classifier about individual classes from its training data. In the commonly investigated setting of face recognition, the target model is trained on facial images to predict a person's identity. Without any further information about the individual identities, MIAs exploit the target model's learned knowledge to create synthetic images that reveal the visual characteristics of specific classes. As a practical example, let us take a high-security facility that uses a face recognition model for access control. MIAs could enable unauthorized adversaries to reconstruct facial features by accessing the face recognition model without any further information required and with the goal of inferring the identity of authorized staff. In this case, a successful attack can lead to access control breaches and potential security and privacy threats to individuals.
arXiv.org Artificial Intelligence
Oct-10-2023
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Materials > Chemicals
- Specialty Chemicals (0.40)
- Technology: