BMMR: ALarge-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset
–Neural Information Processing Systems
In this paper, we introduce BMMR, a large-scale bilingual, multimodal, multidisciplinary reasoning dataset for the community to develop and evaluate large multimodal models (LMMs). BMMR comprises 110k college-level questions spanning 300 UNESCO-defined subjects, spanning diverse formats--multiplechoice, fill-in-the-blank, and open-ended QA--and sourced from both print and digital media such as books, exams, and quizzes. All data are curated and filtered via a human-in-the-loop and scalable framework, and each instance is paired with a high-quality reasoning path. The dataset is organized into two parts: BMMR-Eval that comprises 20,458high-quality instances to comprehensively assess LMMs' knowledge and reasoning across multiple disciplines in both Chinese and English; and BMMR-Train that contains 88,991 instances to support further research and development, extending the current focus on mathematical reasoning to diverse disciplines and domains. In addition, we propose the process-based multi-discipline verifier (i.e., BMMR-Verifier) for accurate and fine-grained evaluation of reasoning paths. Extensive experiments on 24 models reveal that (i) even SOTA models (e.g., o3and Gemini-2.5-Pro)
Neural Information Processing Systems
Jun-17-2026, 13:03:18 GMT
- Country:
- Asia
- China (0.46)
- Middle East > UAE (0.28)
- Asia
- Genre:
- Research Report > Experimental Study (1.00)
- Workflow (0.67)
- Industry:
- Education > Educational Setting (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Representation & Reasoning (1.00)
- Natural Language
- Large Language Model (1.00)
- Chatbot (1.00)
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Information Technology > Artificial Intelligence