UniDoc: Unified Pretraining Framework for Document Understanding

Oct-9-2024, 09:02:45 GMT–Neural Information Processing Systems

Document intelligence automates the extraction of information from documents and supports many business applications. Recent self-supervised learning methods on large-scale unlabeled document datasets have opened up promising directions towards reducing annotation efforts by training models with self-supervised objectives. However, most of the existing document pretraining methods are still language-dominated. UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input. Each input element is composed of words and visual features from a semantic region of the input document image. An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses, encouraging the representation to model sentences, learn similarities, and align modalities.

representation, unidoc, unified pretraining framework, (1 more...)

Neural Information Processing Systems

Oct-9-2024, 09:02:45 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.65)