Survey on Question Answering over Visually Rich Documents: Methods, Challenges, and Trends

Barboule, Camille, Piwowarski, Benjamin, Chabot, Yoan

Jan-4-2025–arXiv.org Artificial Intelligence

Using Large Language Models (LLMs) for Visually-rich Document Understanding (VrDU) has significantly improved performance on tasks requiring both comprehension and generation, such as question answering, albeit introducing new challenges. This survey explains how VrDU models enhanced by LLMs function, covering methods for integrating VrD features into LLMs and highlighting key challenges.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jan-4-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)
- Overview (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found