Automated Multi-level Preference for MLLMs

May-28-2025, 23:16:34 GMT–Neural Information Processing Systems

Current multimodal Large Language Models (MLLMs) suffer from "hallucination", occasionally generating responses that are not grounded in the input images. To tackle this challenge, one promising path is to utilize reinforcement learning from human feedback (RLHF), which steers MLLMs towards learning superior responses while avoiding inferior ones. We rethink the common practice of using binary preferences (i.e., superior, inferior), and find that adopting multi-level preferences (e.g., superior, medium, inferior) is better for two benefits: 1) It narrows the gap between adjacent levels, thereby encouraging MLLMs to discern subtle differences.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Neural Information Processing Systems

May-28-2025, 23:16:34 GMT

Conferences PDF

Add feedback

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Health & Medicine > Therapeutic Area
  - Neurology (0.34)
- Leisure & Entertainment > Sports (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)