Towards Resilient Safety-driven Unlearning for Diffusion Models against Downstream Fine-tuning

Jun-15-2026, 07:10:40 GMT–Neural Information Processing Systems

Text-to-image (T2I) diffusion models have achieved impressive image generation quality and are increasingly fine-tuned for personalized applications. However, these models often inherit unsafe behaviors from toxic pretraining data, raising growing safety concerns. While recent safety-driven unlearning methods have made promising progress in suppressing model toxicity, they are found to be fragile to downstream fine-tuning, as we reveal that state-of-the-art methods largely fail to retain their effectiveness even when fine-tuned on entirely benign datasets. To mitigate this problem, in this paper, we propose ResAlign, a safety-driven unlearning framework with enhanced resilience against downstream fine-tuning. By modeling downstream fine-tuning as an implicit optimization problem with a Moreau envelope-based reformulation, ResAlign enables efficient gradient estimation to minimize the recovery of harmful behaviors. Additionally, a meta-learning strategy is proposed to simulate a diverse distribution of fine-tuning scenarios to improve generalization. Extensive experiments across a wide range of datasets, fine-tuning methods, and configurations demonstrate that ResAlign consistently outperforms prior unlearning approaches in retaining safety, while effectively preserving benign generation capability. Our code and pretrained models are publicly available here. . Disclaimer: This paper includes AI-generated images containing partially nude human figures and other sensitive content, shown only for research purposes.

artificial intelligence, deep learning, machine learning, (20 more...)

Neural Information Processing Systems

Jun-15-2026, 07:10:40 GMT

Conferences PDF

Add feedback

Country:
- Asia (0.28)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)
  - Promising Solution (0.87)

Industry:
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (0.66)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found