Emotion Knowledge Enhancement for Vision Large Language Models: A Self-Verification Approach for High-Quality Emotion Instruction Data Generation

Wang, Feifan, Song, Tengfei, He, Minggui, Su, Chang, Wu, Zhanglin, Yang, Hao, Zheng, Wenming, Yoshie, Osamu

May-27-2025–arXiv.org Artificial Intelligence

Facial emotion perception in the vision large language model (VLLM) is crucial for achieving natural human-machine interaction. However, creating high-quality annotations for both coarse- and fine-grained facial emotion analysis demands costly expertise. The lack of such high-quality instruction data limits the performance of VLLMs in facial emotion perception. To address this, we propose a self-verification approach with emotion knowledge enhancement (SEKE), which generates high-quality instruction data for multi-grained emotion analysis cost-effectively using closed-source VLLM. This approach integrates prior human knowledge to VLLM inference, guided by the inherent correlations between three grained levels of emotion descriptions, i.e., discrete expression, valence-arousal, and action unit, to reliably generate comprehensive annotations. A self-verification strategy with Uncertainty-Aware Monte Carlo sampling (SV-UAMC) is further embedded to efficiently extract more accurate VLLM predictions, further improving annotation reliability. Consequently, we construct a facial emotion instruction dataset (FEID) containing three comprehensive descriptions, which provides coarse- and fine-grained emotional information for effective model training. Additionally, we introduce a facial emotion analysis benchmark (FEAB) to measure the VLLM's corresponding ability. Our method significantly outperforms state-of-the-art methods on three downstream facial emotion analysis tasks.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

May-27-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - China
    - Beijing > Beijing (0.04)
    - Jiangsu Province > Nanjing (0.05)
  - Japan > Kyūshū & Okinawa
    - Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)
- Europe > United Kingdom
  - England > Oxfordshire > Oxford (0.04)
- North America > United States
  - District of Columbia > Washington (0.05)
  - New York > Broome County
    - Binghamton (0.04)
- Oceania > Pitcairn (0.04)

Genre:
- Research Report > Promising Solution (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Vision > Face Recognition (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found