LOVA3: Learning to Visual Question Answering, Asking and Assessment

Mar-22-2026, 14:29:32 GMT–Neural Information Processing Systems

Question answering, asking, and assessment are three innate human traits crucial for understanding the world and acquiring knowledge. By enhancing these capabilities, humans can more effectively utilize data, leading to better comprehension and learning outcomes. However, current Multimodal Large Language Models (MLLMs) primarily focus on question answering, often neglecting the full potential of questioning and assessment skills. In this study, we introduce LOVA3, an innovative framework named ``Learning tO Visual Question Answering, Asking and Assessment,'' designed to equip MLLMs with these additional capabilities. Our approach involves the creation of two supplementary training tasks GenQA and EvalQA, aiming at fostering the skills of asking and assessing questions in the context of images.

artificial intelligence, natural language, proceedings, (6 more...)

Neural Information Processing Systems

Mar-22-2026, 14:29:32 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report > New Finding (0.56)

Technology:
- Information Technology > Artificial Intelligence > Natural Language (1.00)