Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models

He, Jiabang, Hu, Yi, Wang, Lei, Xu, Xing, Liu, Ning, Liu, Hui, Shen, Heng Tao

Jun-5-2023–arXiv.org Artificial Intelligence

Numerous pre-training techniques for visual document understanding (VDU) have recently shown substantial improvements in performance across a wide range of document tasks. However, these pre-trained VDU models cannot guarantee continued success when the distribution of test data differs from the distribution of training data. In this paper, to investigate how robust existing pre-trained VDU models are to various distribution shifts, we first develop an out-of-distribution (OOD) benchmark termed Do-GOOD for the fine-Grained analysis on Document image-related tasks specifically. The Do-GOOD benchmark defines the underlying mechanisms that result in different distribution shifts and contains 9 OOD datasets covering 3 VDU related tasks, e.g., document information extraction, classification and question answering. We then evaluate the robustness and perform a fine-grained analysis of 5 latest VDU pre-trained models and 2 typical OOD generalization algorithms on these OOD datasets. Results from the experiments demonstrate that there is a significant performance gap between the in-distribution (ID) and OOD settings for document images, and that fine-grained analysis of distribution shifts can reveal the brittle nature of existing pre-trained VDU models and OOD generalization algorithms. The code and datasets for our Do-GOOD benchmark can be found at https://github.com/MAEHCM/Do-GOOD.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Jun-5-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - District of Columbia > Washington (0.04)
  - New York > New York County
    - New York City (0.04)
- Asia
  - Singapore (0.04)
  - China
    - Sichuan Province > Chengdu (0.04)
    - Beijing > Beijing (0.04)
    - Guangdong Province > Shenzhen (0.04)

Genre:
- Research Report > New Finding (0.88)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (0.94)
  - Artificial Intelligence
    - Machine Learning > Neural Networks (0.46)
    - Natural Language
      - Text Processing (0.68)
      - Information Retrieval (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found