ViBERTgrid BiLSTM-CRF: Multimodal Key Information Extraction from Unstructured Financial Documents
Pala, Furkan, Akpınar, Mehmet Yasin, Deniz, Onur, Eryiğit, Gülşen
–arXiv.org Artificial Intelligence
Multimodal key information extraction (KIE) models have been studied extensively on semi-structured documents. However, their investigation on unstructured documents is an emerging research topic. The paper presents an approach to adapt a multimodal transformer (i.e., ViBERTgrid previously explored on semi-structured documents) for unstructured financial documents, by incorporating a BiLSTM-CRF layer. The proposed ViBERTgrid BiLSTM-CRF model demonstrates a significant improvement in performance (up to 2 percentage points) on named entity recognition from unstructured documents in financial domain, while maintaining its KIE performance on semi-structured documents. As an additional contribution, we publicly released token-level annotations for the SROIE dataset in order to pave the way for its use in multimodal sequence labeling models.
arXiv.org Artificial Intelligence
Sep-23-2024
- Country:
- Asia > Middle East
- Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Europe
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Switzerland > Geneva
- Geneva (0.04)
- Middle East > Republic of Türkiye
- North America > United States
- Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > Middle East
- Genre:
- Research Report (1.00)
- Industry:
- Banking & Finance (0.68)
- Technology: