LLavaGuard: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment

Helff, Lukas, Friedrich, Felix, Brack, Manuel, Kersting, Kristian, Schramowski, Patrick

Jun-7-2024–arXiv.org Artificial Intelligence

We introduce LlavaGuard, a family of VLM-based safeguard models, offering a versatile framework for evaluating the safety compliance of visual content. Specifically, we designed LlavaGuard for dataset annotation and generative model safeguarding. To this end, we collected and annotated a high-quality visual dataset incorporating a broad safety taxonomy, which we use to tune VLMs on context-aware safety risks. As a key innovation, LlavaGuard's new responses contain comprehensive information, including a safety rating, the violated safety categories, and an in-depth rationale. Further, our introduced customizable taxonomy categories enable the context-specific alignment of LlavaGuard to various scenarios. Our experiments highlight the capabilities of LlavaGuard in complex and real-world applications. We provide checkpoints ranging from 7B to 34B parameters demonstrating state-of-the-art performance, with even the smallest models outperforming baselines like GPT-4. We make our dataset and model weights publicly available and invite further research to address the diverse needs of communities and contexts.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Jun-7-2024

arXiv.org PDF

Add feedback

Country:
- Europe (0.28)
- North America
  - Canada (0.14)
  - United States (0.14)

Genre:
- Research Report (0.64)

Industry:
- Government > Regional Government (0.68)
- Health & Medicine > Therapeutic Area
  - Psychiatry/Psychology (0.70)
- Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.89)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found