LLavaGuard: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment
Helff, Lukas, Friedrich, Felix, Brack, Manuel, Kersting, Kristian, Schramowski, Patrick
–arXiv.org Artificial Intelligence
We introduce LlavaGuard, a family of VLM-based safeguard models, offering a versatile framework for evaluating the safety compliance of visual content. Specifically, we designed LlavaGuard for dataset annotation and generative model safeguarding. To this end, we collected and annotated a high-quality visual dataset incorporating a broad safety taxonomy, which we use to tune VLMs on context-aware safety risks. As a key innovation, LlavaGuard's new responses contain comprehensive information, including a safety rating, the violated safety categories, and an in-depth rationale. Further, our introduced customizable taxonomy categories enable the context-specific alignment of LlavaGuard to various scenarios. Our experiments highlight the capabilities of LlavaGuard in complex and real-world applications. We provide checkpoints ranging from 7B to 34B parameters demonstrating state-of-the-art performance, with even the smallest models outperforming baselines like GPT-4. We make our dataset and model weights publicly available and invite further research to address the diverse needs of communities and contexts.
arXiv.org Artificial Intelligence
Jun-7-2024
- Country:
- Europe (0.28)
- North America
- Canada (0.14)
- United States (0.14)
- Genre:
- Research Report (0.64)
- Industry:
- Government > Regional Government (0.68)
- Health & Medicine > Therapeutic Area
- Psychiatry/Psychology (0.70)
- Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)
- Technology: