AITopics | safetydpo

Collaborating Authors

safetydpo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Personalized Safety Alignment for Text-to-Image Diffusion Models

Lei, Yu, Bai, Jinbin, Shi, Qingyu, Feng, Aosong, Yu, Kaidong

arXiv.org Artificial IntelligenceAug-8-2025

Text-to-image diffusion models have revolutionized visual content generation, but current safety mechanisms apply uniform standards that often fail to account for individual user preferences. These models overlook the diverse safety boundaries shaped by factors like age, mental health, and personal beliefs. To address this, we propose Personalized Safety Alignment (PSA), a framework that allows user-specific control over safety behaviors in generative models. PSA integrates personalized user profiles into the diffusion process, adjusting the model's behavior to match individual safety preferences while preserving image quality. We introduce a new dataset, Sage, which captures user-specific safety preferences and incorporates these profiles through a cross-attention mechanism. Experiments show that PSA outperforms existing methods in harmful content suppression and aligns generated content better with user constraints, achieving higher Win Rate and Pass Rate scores.

artificial intelligence, category, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2508.01151

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > Saudi Arabia > Northern Borders Province > Arar (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (0.93)
Law (0.68)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation

Liu, Runtao, Chieh, Chen I, Gu, Jindong, Zhang, Jipeng, Pi, Renjie, Chen, Qifeng, Torr, Philip, Khakzar, Ashkan, Pizzati, Fabio

arXiv.org Artificial IntelligenceDec-13-2024

Text-to-image (T2I) models have become widespread, but their limited safety guardrails expose end users to harmful content and potentially allow for model misuse. Current safety measures are typically limited to text-based filtering or concept removal strategies, able to remove just a few concepts from the model's generative capabilities. In this work, we introduce SafetyDPO, a method for safety alignment of T2I models through Direct Preference Optimization (DPO). We enable the application of DPO for safety purposes in T2I models by synthetically generating a dataset of harmful and safe image-text pairs, which we call CoProV2. Using a custom DPO strategy and this dataset, we train safety experts, in the form of low-rank adaptation (LoRA) matrices, able to guide the generation process away from specific safety-related concepts. Then, we merge the experts into a single LoRA using a novel merging strategy for optimal scaling performance. This expert-based approach enables scalability, allowing us to remove 7 times more harmful concepts from T2I models compared to baselines. SafetyDPO consistently outperforms the state-of-the-art on many benchmarks and establishes new practices for safety alignment in T2I networks. Code and data will be shared at https://safetydpo.github.io/.

artificial intelligence, machine learning, safetydpo, (18 more...)

arXiv.org Artificial Intelligence

2412.10493

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.69)
Information Technology (0.46)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback