Positive-Unlabeled Diffusion Models for Preventing Sensitive Data Generation
Takahashi, Hiroshi, Iwata, Tomoharu, Kumagai, Atsutoshi, Yamanaka, Yuuki, Yamashita, Tomoya
Diffusion models are powerful generative models but often generate sensitive data that are unwanted by users, mainly because the unlabeled training data frequently contain such sensitive data. Since labeling all sensitive data in the large-scale unlabeled training data is impractical, we address this problem by using a small amount of labeled sensitive data. In this paper, we propose positive-unlabeled diffusion models, which prevent the generation of sensitive data using unlabeled and sensitive data. Therefore, even without labeled normal data, we can maximize the ELBO for normal data and minimize it for labeled sensitive data, ensuring the generation of only normal data. Through experiments across various datasets and settings, we demonstrated that our approach can prevent the generation of sensitive images without compromising image quality. The training of diffusion models can be regarded as the maximization of the evidence lower bound (ELBO), which is the tractable lower bound of the log-likelihood, on the training data (Ho et al., 2020). Users collect these training data from sources like the internet to generate the contents they want, and then perform either training from scratch or fine-tuning. Unfortunately, diffusion models have the potential to generate inappropriate, discriminatory, or harmful contents that are unwanted by users (Brack et al., 2022). For example, they might generate sexual images of real individuals (Mirsky & Lee, 2021; Verdoliva, 2020).
Mar-5-2025