safetydpo
Personalized Safety Alignment for Text-to-Image Diffusion Models
Lei, Yu, Bai, Jinbin, Shi, Qingyu, Feng, Aosong, Yu, Kaidong
Text-to-image diffusion models have revolutionized visual content generation, but current safety mechanisms apply uniform standards that often fail to account for individual user preferences. These models overlook the diverse safety boundaries shaped by factors like age, mental health, and personal beliefs. To address this, we propose Personalized Safety Alignment (PSA), a framework that allows user-specific control over safety behaviors in generative models. PSA integrates personalized user profiles into the diffusion process, adjusting the model's behavior to match individual safety preferences while preserving image quality. We introduce a new dataset, Sage, which captures user-specific safety preferences and incorporates these profiles through a cross-attention mechanism. Experiments show that PSA outperforms existing methods in harmful content suppression and aligns generated content better with user constraints, achieving higher Win Rate and Pass Rate scores.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Middle East > Saudi Arabia > Northern Borders Province > Arar (0.04)
- Asia > Singapore (0.04)
- (2 more...)
- Information Technology > Security & Privacy (0.93)
- Law (0.68)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.34)
SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation
Liu, Runtao, Chieh, Chen I, Gu, Jindong, Zhang, Jipeng, Pi, Renjie, Chen, Qifeng, Torr, Philip, Khakzar, Ashkan, Pizzati, Fabio
Text-to-image (T2I) models have become widespread, but their limited safety guardrails expose end users to harmful content and potentially allow for model misuse. Current safety measures are typically limited to text-based filtering or concept removal strategies, able to remove just a few concepts from the model's generative capabilities. In this work, we introduce SafetyDPO, a method for safety alignment of T2I models through Direct Preference Optimization (DPO). We enable the application of DPO for safety purposes in T2I models by synthetically generating a dataset of harmful and safe image-text pairs, which we call CoProV2. Using a custom DPO strategy and this dataset, we train safety experts, in the form of low-rank adaptation (LoRA) matrices, able to guide the generation process away from specific safety-related concepts. Then, we merge the experts into a single LoRA using a novel merging strategy for optimal scaling performance. This expert-based approach enables scalability, allowing us to remove 7 times more harmful concepts from T2I models compared to baselines. SafetyDPO consistently outperforms the state-of-the-art on many benchmarks and establishes new practices for safety alignment in T2I networks. Code and data will be shared at https://safetydpo.github.io/.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > China > Hong Kong (0.04)