USER-VLM 360: Personalized Vision Language Models with User-aware Tuning for Social Human-Robot Interactions
Rahimi, Hamed, Bahaj, Adil, Abrini, Mouad, Khoramshahi, Mahdi, Ghogho, Mounir, Chetouani, Mohamed
–arXiv.org Artificial Intelligence
The integration of vision-language models into robotic systems constitutes a significant advancement in enabling machines to interact with their surroundings in a more intuitive manner. While VLMs offer rich multimodal reasoning, existing approaches lack user-specific adaptability, often relying on generic interaction paradigms that fail to account for individual behavioral, contextual, or socio-emotional nuances. When customization is attempted, ethical concerns arise from unmitigated biases in user data, risking exclusion or unfair treatment. To address these dual challenges, we propose User-VLM 360{\deg}, a holistic framework integrating multimodal user modeling with bias-aware optimization. Our approach features: (1) user-aware tuning that adapts interactions in real time using visual-linguistic signals; (2) bias mitigation via preference optimization; and (3) curated 360{\deg} socio-emotive interaction datasets annotated with demographic, emotion, and relational metadata. Evaluations across eight benchmarks demonstrate state-of-the-art results: +35.3% F1 in personalized VQA, +47.5% F1 in facial features understanding, 15% bias reduction, and 30X speedup over baselines. Ablation studies confirm component efficacy, and deployment on the Pepper robot validates real-time adaptability across diverse users. We open-source parameter-efficient 3B/10B models and an ethical verification framework for responsible adaptation.
arXiv.org Artificial Intelligence
Feb-14-2025
- Country:
- Europe
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- France > Île-de-France
- Spain > Catalonia
- Asia > Middle East
- UAE (0.04)
- Africa > Middle East
- Morocco > Rabat-Salé-Kénitra Region > Rabat (0.04)
- Europe
- Genre:
- Research Report > Experimental Study (0.46)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Robots (1.00)
- Natural Language > Large Language Model (1.00)
- Machine Learning > Neural Networks
- Deep Learning (0.47)
- Perceptrons (0.46)
- Information Technology > Artificial Intelligence