VisualWordGrid: Information Extraction From Scanned Documents Using A Multimodal Approach

Kerroumi, Mohamed, Sayem, Othmane, Shabou, Aymen

Oct-13-2020–arXiv.org Artificial Intelligence

We introduce a novel approach for scanned document representation to perform field extraction. It allows the simultaneous encoding of the textual, visual and layout information in a 3D matrix used as an input to a segmentation model. We improve the recent Chargrid and Wordgrid models in several ways, first by taking into account the visual modality, then by boosting its robustness in regards to small datasets while keeping the inference time low. Our approach is tested on public and private document-image datasets, showing higher performances compared to the recent state-of-the-art methods.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

Oct-13-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > San Diego County > San Diego (0.04)
- Europe
  - France > Île-de-France
    - Hauts-de-Seine > Montrouge (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)

Genre:
- Research Report > Promising Solution (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.95)
  - Natural Language > Information Extraction (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found