Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

Dec-23-2025, 19:06:17 GMT–Neural Information Processing Systems

When answering a question, humans utilize the information available across different modalities to synthesize a consistent and complete chain of thought (CoT). This process is normally a black box in the case of deep learning models like large-scale language models. Recently, science question benchmarks have been used to diagnose the multi-hop reasoning ability and interpretability of an AI system. However, existing datasets fail to provide annotations for the answers, or are restricted to the textual-only modality, small scales, and limited domain diversity. To this end, we present Science Question Answering (ScienceQA), a new benchmark that consists of ~21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations.

language model, multimodal reasoning, thought chain, (10 more...)

Neural Information Processing Systems

Dec-23-2025, 19:06:17 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (0.96)
  - Machine Learning > Neural Networks
    - Deep Learning (0.78)