LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

Huang, Yupan, Lv, Tengchao, Cui, Lei, Lu, Yutong, Wei, Furu

Jul-19-2022–arXiv.org Artificial Intelligence

Self-supervised pre-training techniques have achieved remarkable progress in Document AI. Most multimodal pre-trained models use a masked language modeling objective to learn bidirectional representations on the text modality, but they differ in pre-training objectives for the image modality. This discrepancy adds difficulty to multimodal representation learning. In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking. Additionally, LayoutLMv3 is pre-trained with a word-patch alignment objective to learn cross-modal alignment by predicting whether the corresponding image patch of a text word is masked. The simple unified architecture and training objectives make LayoutLMv3 a general-purpose pre-trained model for both text-centric and image-centric Document AI tasks. Experimental results show that LayoutLMv3 achieves state-of-the-art performance not only in text-centric tasks, including form understanding, receipt understanding, and document visual question answering, but also in image-centric tasks such as document image classification and document layout analysis. The code and models are publicly available at \url{https://aka.ms/layoutlmv3}.

layoutlmv3, objective, representation, (15 more...)

arXiv.org Artificial Intelligence

Jul-19-2022

arXiv.org PDF

Add feedback

Country:
- Asia (0.04)
- North America > United States
  - Massachusetts (0.04)
  - New York > New York County
    - New York City (0.04)
- Europe > Portugal
  - Lisbon > Lisbon (0.04)

Genre:
- Research Report > New Finding (0.66)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Natural Language > Text Processing (0.68)
    - Machine Learning > Neural Networks (0.68)
    - Vision > Image Understanding (0.66)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found