INTERMT: Multi-Turn Interleaved Preference Alignment with Human Feedback
–Neural Information Processing Systems
As multimodal large models (MLLMs) continue to advance across challenging tasks, a key question emerges: What essential capabilities are still missing? A critical aspect of human learning is continuous interaction with the environment - not limited to language, but also involving multimodal understanding and generation. To move closer to human-level intelligence, models must similarly support multi-turn, multimodal interaction. In particular, they should comprehend interleaved multimodal contexts and respond coherently in ongoing exchanges. In this work, we present an initial exploration through the INTERMT - the first preference dataset for multi-turn multimodal interaction, grounded in real human feedback.
Neural Information Processing Systems
Jun-19-2026, 17:14:42 GMT
- Genre:
- Instructional Material (0.67)
- Overview (0.67)
- Research Report
- Experimental Study (1.00)
- New Finding (0.92)
- Industry:
- Education (1.00)
- Health & Medicine (0.67)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Representation & Reasoning (1.00)
- Cognitive Science > Problem Solving (0.68)
- Natural Language
- Large Language Model (1.00)
- Chatbot (1.00)
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Information Technology > Artificial Intelligence