BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations

Giovannini, Simone, Coppini, Fabio, Gemelli, Andrea, Marinai, Simone

Jan-6-2025–arXiv.org Artificial Intelligence

We present a unified dataset for document Question-Answering (QA), which is obtained combining several public datasets related to Document AI and visually rich document understanding (VRDU). Our main contribution is twofold: on the one hand we reformulate existing Document AI tasks, such as Information Extraction (IE), into a Question-Answering task, making it a suitable resource for training and evaluating Large Language Models; on the other hand, we release the OCR of all the documents and include the exact position of the answer to be found in the document image as a bounding box. Using this dataset, we explore the impact of different prompting techniques (that might include bounding box information) on the performance of open-weight models, identifying the most effective approaches for document comprehension.

large language model, machine learning, question answering, (22 more...)

arXiv.org Artificial Intelligence

Jan-6-2025

arXiv.org PDF

Add feedback

Country:
- Europe
  - Greece (0.04)
  - France (0.04)
  - United Kingdom > England
    - Staffordshire > Stoke-on-Trent (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Italy > Friuli Venezia Giulia
    - Trieste Province > Trieste (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > New Finding (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Question Answering (0.92)
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)