SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers

Singh, Shruti, Sarkar, Nandan, Cohan, Arman

Nov-8-2024–arXiv.org Artificial Intelligence

Scientific literature is typically dense, requiring significant background knowledge and deep comprehension for effective engagement. We introduce SciDQA, a new dataset for reading comprehension that challenges LLMs for a deep understanding of scientific articles, consisting of 2,937 QA pairs. Unlike other scientific QA datasets, SciDQA sources questions from peer reviews by domain experts and answers by paper authors, ensuring a thorough examination of the literature. We enhance the dataset's quality through a process that carefully filters out lower quality questions, decontextualizes the content, tracks the source document across different versions, and incorporates a bibliography for multi-document question-answering. Questions in SciDQA necessitate reasoning across figures, tables, equations, appendices, and supplementary materials, and require multi-document reasoning. We evaluate several open-source and proprietary LLMs across various configurations to explore their capabilities in generating relevant and factual responses. Our comprehensive evaluation, based on metrics for surface-level similarity and LLM judgements, highlights notable performance discrepancies. SciDQA represents a rigorously curated, naturally derived scientific QA dataset, designed to facilitate research on complex scientific text understanding.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

Nov-8-2024

arXiv.org PDF

Add feedback

Country:
- Asia (0.67)
- North America > United States (0.46)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Energy > Oil & Gas
  - Upstream (0.50)
- Education > Assessment & Standards
  - Student Performance (0.61)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)