Training-Free Safe Denoisers for Safe Use of Diffusion Models

Kim, Mingyu, Kim, Dongjun, Yusuf, Amman, Ermon, Stefano, Park, Mi Jung

Feb-12-2025–arXiv.org Artificial Intelligence

There is growing concern over the safety of powerful diffusion models (DMs), as they are often misused to produce inappropriate, not-safe-for-work (NSFW) content or generate copyrighted material or data of individuals who wish to be forgotten. Many existing methods tackle these issues by heavily relying on text-based negative prompts or extensively retraining DMs to eliminate certain features or samples. In this paper, we take a radically different approach, directly modifying the sampling trajectory by leveraging a negation set (e.g., unsafe images, copyrighted data, or datapoints needed to be excluded) to avoid specific regions of data distribution, without needing to retrain or fine-tune DMs. We formally derive the relationship between the expected denoised samples that are safe and those that are not safe, leading to our $\textit{safe}$ denoiser which ensures its final samples are away from the area to be negated. Inspired by the derivation, we develop a practical algorithm that successfully produces high-quality samples while avoiding negation areas of the data distribution in text-conditional, class-conditional, and unconditional image generation scenarios. These results hint at the great potential of our training-free safe denoiser for using DMs more safely.

artificial intelligence, machine learning, training-free safe denoiser, (2 more...)

arXiv.org Artificial Intelligence

Feb-12-2025

arXiv.org Web Page

Add feedback

Country:
- Asia > Middle East
  - UAE (0.04)
- Europe > United Kingdom (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.89)