AITopics | safe-clip

Collaborating Authors

safe-clip

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SafeR-CLIP: Mitigating NSFW Content in Vision-Language Models While Preserving Pre-Trained Knowledge

Yousaf, Adeel, Fioresi, Joseph, Beetham, James, Bedi, Amrit Singh, Shah, Mubarak

arXiv.org Artificial IntelligenceNov-24-2025

Improving the safety of vision-language models like CLIP via fine-tuning often comes at a steep price, causing significant drops in their generalization performance. We find this trade-off stems from rigid alignment strategies that force unsafe concepts toward single, predefined safe targets, disrupting the model's learned semantic structure. To address this, we propose a proximity-aware approach: redirecting unsafe concepts to their semantically closest safe alternatives to minimize representational change. We introduce SafeR-CLIP, a fine-tuning framework that applies this principle of minimal intervention. SafeR-CLIP successfully reconciles safety and performance, recovering up to 8.0% in zero-shot accuracy over prior methods while maintaining robust safety. To support more rigorous evaluation, we also contribute NSFW-Caps, a new benchmark of 1,000 highly-aligned pairs for testing safety under distributional shift. Our work shows that respecting the geometry of pretrained representations is key to achieving safety without sacrificing performance.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.16743

Country: Europe > Switzerland (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.67)
Law (0.67)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Safe Vision-Language Models via Unsafe Weights Manipulation

D'Incà, Moreno, Peruzzo, Elia, Xu, Xingqian, Shi, Humphrey, Sebe, Nicu, Mancini, Massimiliano

arXiv.org Artificial IntelligenceMar-14-2025

Vision-language models (VLMs) often inherit the biases and unsafe associations present within their large-scale training dataset. While recent approaches mitigate unsafe behaviors, their evaluation focuses on how safe the model is on unsafe inputs, ignoring potential shortcomings on safe ones. In this paper, we first revise safety evaluation by introducing SafeGround, a new set of metrics that evaluate safety at different levels of granularity. With this metric, we uncover a surprising issue of training-based methods: they make the model less safe on safe inputs. From this finding, we take a different direction and explore whether it is possible to make a model safer without training, introducing Unsafe Weights Manipulation (UWM). UWM uses a calibration set of safe and unsafe instances to compare activations between safe and unsafe content, identifying the most important parameters for processing the latter. Their values are then manipulated via negation. Experiments show that UWM achieves the best tradeoff between safety and knowledge preservation, consistently improving VLMs on unsafe queries while outperforming even training-based state-of-the-art methods on safe ones.

query, safe-clip, safety, (13 more...)

arXiv.org Artificial Intelligence

2503.11742

Genre: Research Report > New Finding (0.93)

Industry:

Transportation (1.00)
Leisure & Entertainment > Sports > Tennis (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Removing NSFW Concepts from Vision-and-Language Models for Text-to-Image Retrieval and Generation

Poppi, Samuele, Poppi, Tobia, Cocchi, Federico, Cornia, Marcella, Baraldi, Lorenzo, Cucchiara, Rita

arXiv.org Artificial IntelligenceNov-27-2023

Vision-and-Language models such as CLIP have demonstrated remarkable effectiveness across a wide range of tasks. However, these models are typically trained on web-scale data, which can introduce inappropriate content and lead to the development of unsafe and biased behavior. This, in turn, hampers their applicability in sensitive and trustworthy contexts and could raise significant concern in their adoption. To overcome these limitations, we introduce a methodology to make Vision-and-Language models safer by removing their sensitivity to not-safe-for-work concepts. We show how this can be done by distilling from a large language model which converts between safe and unsafe sentences and which is fine-tuned starting from just 100 manually-curated pairs. We conduct extensive experiments on the resulting embedding space for both retrieval and text-to-image generation, where we show that our model can also be properly employed with pre-trained image generators. Our source code and trained models are available at: https://github.com/aimagelab/safe-clip.

dataset, encoder, safe-clip, (15 more...)

arXiv.org Artificial Intelligence

2311.16254

Country:

Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
Europe > Italy > Emilia-Romagna > Modeno Province > Modena (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.46)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback