arXiVeri: Automatic table verification with GPT

Shin, Gyungin, Xie, Weidi, Albanie, Samuel

Jun-13-2023–arXiv.org Artificial Intelligence

Unfortunately, the process of copying numerical data from one paper to another is prone to human error. In this paper, we propose to meet this challenge through the novel task of automatic table verification (AutoTV), in which the objective is to verify the accuracy of numerical data in tables by cross-referencing cited sources. To support this task, we propose a new benchmark, arXiVeri, which comprises tabular data drawn from open-access academic papers on arXiv. We introduce metrics to evaluate the performance of a table verifier in two key areas: (i) table matching, which aims to identify the source table in a cited document that corresponds to a target table, and (ii) cell matching, which aims to locate shared cells between a target and source table and identify their row and column indices accurately. By leveraging the flexible capabilities of modern large language models (LLMs), we propose simple baselines for table verification. Our findings highlight the complexity of this task, even for state-of-the-art LLMs like OpenAI's GPT-4. The code and benchmark will be made publicly available.

large language model, machine learning, target table, (18 more...)

arXiv.org Artificial Intelligence

Jun-13-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Colorado (0.04)
- Europe > United Kingdom
  - England
    - Oxfordshire > Oxford (0.14)
    - Cambridgeshire > Cambridge (0.14)
- Asia
  - Middle East > Jordan (0.04)
  - Japan > Honshū
    - Chūbu > Toyama Prefecture > Toyama (0.04)
  - China > Shanghai
    - Shanghai (0.04)

Genre:
- Research Report > New Finding (0.66)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found