Towards a Multi-modal, Multi-task Learning based Pre-training Framework for Document Representation Learning

Pramanik, Subhojeet, Mujumdar, Shashank, Patel, Hima

Sep-30-2020–arXiv.org Artificial Intelligence

In this paper, we propose a multi-task learning-based framework that utilizes a combination of self-supervised and supervised pre-training tasks to learn a generic document representation. We design the network architecture and the pre-training tasks to incorporate the multi-modal document information across text, layout, and image dimensions and allow the network to work with multi-page documents. We showcase the applicability of our pre-training framework on a variety of different real-world document tasks such as document classification, document information extraction, and document retrieval. We conduct exhaustive experiments to compare performance against different ablations of our framework and state-of-the-art baselines. We discuss the current limitations and next steps for our work.

artificial intelligence, dataset, neural network, (21 more...)

arXiv.org Artificial Intelligence

Sep-30-2020

arXiv.org PDF

Add feedback

Country:
- Asia (0.14)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)
  - Natural Language > Information Retrieval (0.72)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found