BioMol-MQA: A Multi-Modal Question Answering Dataset For LLM Reasoning Over Bio-Molecular Interactions

Sengupta, Saptarshi, Yang, Shuhua, Yu, Paul Kwong, Wang, Fali, Wang, Suhang

Jun-9-2025–arXiv.org Artificial Intelligence

Retrieval augmented generation (RAG) has shown great power in improving Large Language Models (LLMs). However, most existing RAG-based LLMs are dedicated to retrieving single modality information, mainly text; while for many real-world problems, such as healthcare, information relevant to queries can manifest in various modalities such as knowledge graph, text (clinical notes), and complex molecular structure. Thus, being able to retrieve relevant multi-modality domain-specific information, and reason and synthesize diverse knowledge to generate an accurate response is important. To address the gap, we present BioMol-MQA, a new question-answering (QA) dataset on polypharmacy, which is composed of two parts (i) a multimodal knowledge graph (KG) with text and molecular structure for information retrieval; and (ii) challenging questions that designed to test LLM capabilities in retrieving and reasoning over multimodal KG to answer questions. Our benchmarks indicate that existing LLMs struggle to answer these questions and do well only when given the necessary background data, signaling the necessity for strong RAG frameworks.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Jun-9-2025

arXiv.org PDF

Add feedback

Country:
- Asia (1.00)
- North America > United States
  - Minnesota (0.28)

Genre:
- Overview (1.00)
- Research Report
  - New Finding (0.93)
  - Experimental Study (0.67)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Consumer Health (1.00)
  - Therapeutic Area
    - Musculoskeletal (1.00)
    - Neurology (0.92)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found