ViBERTgrid BiLSTM-CRF: Multimodal Key Information Extraction from Unstructured Financial Documents

Pala, Furkan, Akpınar, Mehmet Yasin, Deniz, Onur, Eryiğit, Gülşen

Sep-23-2024–arXiv.org Artificial Intelligence

Multimodal key information extraction (KIE) models have been studied extensively on semi-structured documents. However, their investigation on unstructured documents is an emerging research topic. The paper presents an approach to adapt a multimodal transformer (i.e., ViBERTgrid previously explored on semi-structured documents) for unstructured financial documents, by incorporating a BiLSTM-CRF layer. The proposed ViBERTgrid BiLSTM-CRF model demonstrates a significant improvement in performance (up to 2 percentage points) on named entity recognition from unstructured documents in financial domain, while maintaining its KIE performance on semi-structured documents. As an additional contribution, we publicly released token-level annotations for the SROIE dataset in order to pave the way for its use in multimodal sequence labeling models.

dataset, extraction, information, (14 more...)

arXiv.org Artificial Intelligence

Sep-23-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Louisiana > Orleans Parish > New Orleans (0.04)
- Europe
  - Switzerland > Geneva
    - Geneva (0.04)
  - Middle East > Republic of Türkiye
    - Istanbul Province > Istanbul (0.04)
- Asia > Middle East
  - Republic of Türkiye > Istanbul Province > Istanbul (0.04)

Genre:
- Research Report (1.00)

Industry:
- Banking & Finance (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Processing (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)