Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning

Chaubey, Ashutosh, Guan, Xulang, Soleymani, Mohammad

Apr-11-2025–arXiv.org Artificial Intelligence

The human face plays a central role in social communication, necessitating the use of performant computer vision tools for human-centered applications. W e propose Face-LLaVA, a multimodal large language model for face-centered, in-context learning, including facial expression and attribute recognition. Additionally, Face-LLaVA is able to generate natural language descriptions that can be used for reasoning. Leveraging existing visual databases, we first developed FaceInstruct-1M, a face-centered database for instruction tuning MLLMs for face processing. W e then developed a novel face-specific visual encoder powered by Face-Region Guided Cross-Attention that integrates face geometry with local visual features. W e evaluated the proposed method across nine different datasets and five different face processing tasks, including facial expression recognition, action unit detection, facial attribute detection, age estimation and deepfake detection. Face-LLaVA achieves superior results compared to existing open-source MLLMs and competitive performance compared to commercial solutions. Our model output also receives a higher reasoning rating by GPT under a zero-shot setting across all the tasks. Both our dataset and model wil be released at https://face-llava.github.io/ to support future advancements in social AI and foundational vision-language research.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Apr-11-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre:
- Research Report (1.00)
- Overview (0.92)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision > Face Recognition (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.97)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found