Evaluating Out-of-Distribution Performance on Document Image Classifiers

Larson, Stefan, Lim, Gordon, Ai, Yutong, Kuang, David, Leach, Kevin

Jan-18-2023–arXiv.org Artificial Intelligence

The ability of a document classifier to handle inputs that are drawn from a distribution different from the training distribution is crucial for robust deployment and generalizability. The RVL-CDIP corpus is the de facto standard benchmark for document classification, yet to our knowledge all studies that use this corpus do not include evaluation on out-of-distribution documents. In this paper, we curate and release a new out-of-distribution benchmark for evaluating out-of-distribution performance for document classifiers. Our new out-of-distribution benchmark consists of two types of documents: those that are not part of any of the 16 in-domain RVL-CDIP categories (RVL-CDIP-O), and those that are one of the 16 in-domain categories yet are drawn from a distribution different from that of the original RVL-CDIP dataset (RVL-CDIP-N). While prior work on document classification for in-domain RVL-CDIP documents reports high accuracy scores, we find that these models exhibit accuracy drops of between roughly 15-30% on our new out-of-domain RVL-CDIP-N benchmark, and further struggle to distinguish between in-domain RVL-CDIP-N and out-of-domain RVL-CDIP-O inputs. Our new benchmark provides researchers with a valuable new resource for analyzing out-of-distribution performance on document classifiers. Our new out-of-distribution data can be found at https://github.com/gxlarson/rvl-cdip-ood.

confidence score, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Jan-18-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Tennessee > Davidson County
    - Nashville (0.04)
  - Michigan > Washtenaw County
    - Ann Arbor (0.04)

Genre:
- Research Report (0.82)

Industry:
- Law (0.46)

Technology:
- Information Technology
  - Data Science > Data Mining (0.93)
  - Artificial Intelligence
    - Vision (1.00)
    - Natural Language > Text Classification (0.69)
    - Machine Learning
      - Performance Analysis > Accuracy (1.00)
      - Neural Networks > Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found