Automated Multi-level Preference for MLLMs
–Neural Information Processing Systems
Current multimodal Large Language Models (MLLMs) suffer from "hallucination", occasionally generating responses that are not grounded in the input images. To tackle this challenge, one promising path is to utilize reinforcement learning from human feedback (RLHF), which steers MLLMs towards learning superior responses while avoiding inferior ones. We rethink the common practice of using binary preferences (i.e., superior, inferior), and find that adopting multi-level preferences (e.g., superior, medium, inferior) is better for two benefits: 1) It narrows the gap between adjacent levels, thereby encouraging MLLMs to discern subtle differences.
Neural Information Processing Systems
May-28-2025, 23:16:34 GMT
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Health & Medicine > Therapeutic Area
- Neurology (0.34)
- Leisure & Entertainment > Sports (0.46)
- Health & Medicine > Therapeutic Area
- Technology: