AITopics | msqa

Collaborating Authors

msqa

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multi-modalSituated Reasoningin3DScenes

Neural Information Processing SystemsFeb-18-2026, 20:14:01 GMT

Comprehensiveevaluationson MSQA andMSNN highlight thelimitations ofexisting vision-language models and underscore the importance of handling multi-modal interleaved inputs and situation modeling.

large language model, machine learning, orientation, (22 more...)

Neural Information Processing Systems

Country: Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Vision (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Multi-modal Situated Reasoning in 3D Scenes

Neural Information Processing SystemsDec-27-2025, 14:38:16 GMT

Situation awareness is essential for understanding and reasoning about 3D scenes in embodied AI agents. However, existing datasets and benchmarks for situated understanding suffer from severe limitations in data modality, scope, diversity, and scale. To address these limitations, we propose Multi-modal Situated Question Answering (MSQA), a large-scale multi-modal situated reasoning dataset, scalably collected leveraging 3D scene graphs and vision-language models (VLMs) across a diverse range of real-world 3D scenes. MSQA includes 251K situated questionanswering pairs across 9 distinct question categories, covering complex scenarios and object modalities within 3D scenes. We introduce a novel interleaved multimodal input setting in our benchmark to provide both texts, images, and point clouds for situation and question description, aiming to resolve ambiguity in describing situations with single-modality inputs (e.g., texts). Additionally, we devise the Multi-modal Next-step Navigation (MSNN) benchmark to evaluate models' grounding of actions and transitions between situations. Comprehensive evaluations on reasoning and navigation tasks highlight the limitations of existing vision-language models and underscore the importance of handling multi-modal interleaved inputs and situation modeling. Experiments on data scaling and crossdomain transfer further demonstrate the effectiveness of leveraging MSQA as a pre-training dataset for developing more powerful situated reasoning models, contributing to advancements in 3D scene understanding for embodied AI.

artificial intelligence, machine learning, natural language, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.82)
Information Technology > Artificial Intelligence > Natural Language (0.82)
Information Technology > Artificial Intelligence > Machine Learning (0.76)

Add feedback

Multi-modal Situated Reasoning in 3D Scenes

Neural Information Processing SystemsOct-10-2025, 22:38:11 GMT

However, existing datasets and benchmarks for situated understanding are limited in data modality, diversity, scale, and task scope.

computer vision and pattern recognition, msqa, orientation, (11 more...)

Neural Information Processing Systems

Country: Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Multi-modal Situated Reasoning in 3D Scenes

Neural Information Processing SystemsJun-2-2025, 09:38:57 GMT

benchmark, limitation, multi-modal situated reasoning, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.66)
Information Technology > Artificial Intelligence > Vision (0.46)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

MSQA: Benchmarking LLMs on Graduate-Level Materials Science Reasoning and Knowledge

Cheung, Jerry Junyang, Shen, Shiyao, Zhuang, Yuchen, Li, Yinghao, Ramprasad, Rampi, Zhang, Chao

arXiv.org Artificial IntelligenceJun-2-2025

Despite recent advances in large language models (LLMs) for materials science, there is a lack of benchmarks for evaluating their domain-specific knowledge and complex reasoning abilities. To bridge this gap, we introduce MSQA, a comprehensive evaluation benchmark of 1,757 graduate-level materials science questions in two formats: detailed explanatory responses and binary True/False assessments. MSQA distinctively challenges LLMs by requiring both precise factual knowledge and multi-step reasoning across seven materials science sub-fields, such as structure-property relationships, synthesis processes, and computational modeling. Through experiments with 10 state-of-the-art LLMs, we identify significant gaps in current LLM performance. While API-based proprietary LLMs achieve up to 84.5% accuracy, open-source (OSS) LLMs peak around 60.5%, and domain-specific LLMs often underperform significantly due to overfitting and distributional shifts. MSQA represents the first benchmark to jointly evaluate the factual and reasoning capabilities of LLMs crucial for LLMs in advanced materials science.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2505.23982

Genre: Research Report > New Finding (0.68)

Industry: Materials > Chemicals (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)

Add feedback

Multi-modal Situated Reasoning in 3D Scenes

Linghu, Xiongkun, Huang, Jiangyong, Niu, Xuesong, Ma, Xiaojian, Jia, Baoxiong, Huang, Siyuan

arXiv.org Artificial IntelligenceSep-3-2024

Situation awareness is essential for understanding and reasoning about 3D scenes in embodied AI agents. However, existing datasets and benchmarks for situated understanding are limited in data modality, diversity, scale, and task scope. To address these limitations, we propose Multi-modal Situated Question Answering (MSQA), a large-scale multi-modal situated reasoning dataset, scalably collected leveraging 3D scene graphs and vision-language models (VLMs) across a diverse range of real-world 3D scenes. MSQA includes 251K situated question-answering pairs across 9 distinct question categories, covering complex scenarios within 3D scenes. We introduce a novel interleaved multi-modal input setting in our benchmark to provide text, image, and point cloud for situation and question description, resolving ambiguity in previous single-modality convention (e.g., text). Additionally, we devise the Multi-modal Situated Next-step Navigation (MSNN) benchmark to evaluate models' situated reasoning for navigation. Comprehensive evaluations on MSQA and MSNN highlight the limitations of existing vision-language models and underscore the importance of handling multi-modal interleaved inputs and situation modeling. Experiments on data scaling and cross-domain transfer further demonstrate the efficacy of leveraging MSQA as a pre-training dataset for developing more powerful situated reasoning models.

computer vision and pattern recognition, msqa, reasoning, (12 more...)

arXiv.org Artificial Intelligence

2409.02389

Country: Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.81)

Industry: Government > Military (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)
(3 more...)

Add feedback