A Large-Scale Human-Centric Benchmark for Referring Expression Comprehension in the LMM Era

May-30-2025, 12:13:31 GMT–Neural Information Processing Systems

Prior research in human-centric AI has primarily addressed single-modality tasks like pedestrian detection, action recognition, and pose estimation. However, the emergence of large multimodal models (LMMs) such as GPT-4V has redirected attention towards integrating language with visual content. Referring expression comprehension (REC) represents a prime example of this multimodal approach.

benchmark, large language model, machine learning, (15 more...)

Neural Information Processing Systems

May-30-2025, 12:13:31 GMT

Conferences PDF

Add feedback

Country:
- Europe > Switzerland > Zürich > Zürich (0.14)

Genre:
- Research Report (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (1.00)
  - Vision (1.00)