DocGraphLM: Documental Graph Language Model for Information Extraction

Wang, Dongsheng, Ma, Zhiqiang, Nourbakhsh, Armineh, Gu, Kang, Shah, Sameena

Jan-5-2024–arXiv.org Artificial Intelligence

Advances in Visually Rich Document Understanding (VrDU) have Information extraction from visually-rich documents (VrDs), such enabled information extraction and question answering over documents as business forms, receipts, and invoices in the format of PDF or with complex layouts. Two tropes of architectures have image has gained recent traction. Tasks such as field identification emerged--transformer-based models inspired by LLMs, and Graph and extraction and entity linkage are crucial to digitizing VrDs Neural Networks. In this paper, we introduce DocGraphLM, a novel and building information retrieval systems on the data. Tasks that framework that combines pre-trained language models with graph require complex reasoning such as Visual Question Answering semantics. To achieve this, we propose 1) a joint encoder architecture over documents require modeling the spatial, visual, and semantic to represent documents, and 2) a novel link prediction approach signals in VrDs. Therefore, VrD Understanding is concerned with to reconstruct document graphs. DocGraphLM predicts both directions modeling the multi-modal content in image documents. Previous and distances between nodes using a convergent joint loss research has explored the use of encoding text, layout, and image function that prioritizes neighborhood restoration and downweighs features in a layout language model or multi-modal setting to improve distant node detection.

data mining, information retrieval, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Jan-5-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
  - New York > New York County
    - New York City (0.15)

Genre:
- Research Report (0.83)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks (0.88)
    - Natural Language
      - Information Extraction (0.93)
      - Information Retrieval (0.88)
  - Data Science > Data Mining
    - Text Mining (0.83)