InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback

Zhao, Henry Hengyuan, Pei, Wenqi, Tao, Yifei, Mei, Haiyang, Shou, Mike Zheng

Mar-8-2025–arXiv.org Artificial Intelligence

Existing benchmarks do not test Large Multimodal Models (LMMs) on their interactive intelligence with human users, which is vital for developing generalpurpose AI assistants. We design InterFeedback, an interactive framework, which can be applied to any LMM and dataset to assess this ability autonomously. On top of this, we introduce InterFeedback-Bench that evaluates interactive intelligence using two representative datasets, MMMU-Pro and MathVerse, to test 10 different open-source LMMs. Additionally, we present InterFeedback-Human, a newly collected dataset of 120 cases designed for manually testing interactive performance in leading models such as OpenAI-o1 and Claude-3.5-Sonnet. Our evaluation results indicate that even the state-of-the-art LMM, OpenAI-o1, struggles to refine its responses based on human feedback, achieving an average score of less than 50%. Our findings point to the need for methods that can enhance LMMs' capabilities to interpret and benefit from feedback. In this paper, we are curious about the question "Can Large Multimodal Models evolve through Interactive Human Feedback?" It is central to developing general-purpose AI assistants with Large Multimodal Models (LMMs). While these models show exceptional performance on tackling multimodal tasks directly, their ability to interact with humans remains largely unknown. We argue that an LMM functioning as the general assistant should possess two capabilities: 1) exceptional problem-solving ability and 2) the ability to improve itself through feedback (e.g., human feedback, execution results).

incorrect, lmm, zhang, (16 more...)

arXiv.org Artificial Intelligence

Mar-8-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Singapore (0.04)
- North America > Mexico
  - Mexico City > Mexico City (0.04)

Genre:
- Research Report > New Finding (0.34)

Industry:
- Health & Medicine (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Personal Assistant Systems (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning
      - Generative AI (0.47)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found