Can LLMs Generate Tabular Summaries of Science Papers? Rethinking the Evaluation Protocol

Wang, Weiqi, Ou, Jiefu, Song, Yangqiu, Van Durme, Benjamin, Khashabi, Daniel

Aug-4-2025–arXiv.org Artificial Intelligence

Literature review tables are essential for summarizing and comparing collections of scientific papers. We explore the task of generating tables that best fulfill a user's informational needs given a collection of scientific papers. Building on recent work (Newman et al., 2024), we extend prior approaches to address real-world complexities through a combination of LLM-based methods and human annotations. Our contributions focus on three key challenges encountered in real-world use: (i) User prompts are often under-specified; (ii) Retrieved candidate papers frequently contain irrelevant content; and (iii) Task evaluation should move beyond shallow text similarity techniques and instead assess the utility of inferred tables for information-seeking tasks (e.g., comparing papers). To support reproducible evaluation, we introduce ARXIV2TABLE, a more realistic and challenging benchmark for this task, along with a novel approach to improve literature review table generation in real-world scenarios. Our extensive experiments on this benchmark show that both open-weight and proprietary LLMs struggle with the task, highlighting its difficulty and the need for further advancements. Our dataset and code are available at https://github.com/JHU-CLSP/arXiv2Table.

large language model, machine learning, user demand, (20 more...)

arXiv.org Artificial Intelligence

Aug-4-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (1.00)
- Europe (1.00)
- Asia (1.00)

Genre:
- Overview > Innovation (0.34)
- Research Report
  - New Finding (0.46)
  - Promising Solution (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found