An Augmentation Strategy for Visually Rich Documents

Xie, Jing, Wendt, James B., Zhou, Yichao, Ebner, Seth, Tata, Sandeep

Dec-22-2022–arXiv.org Artificial Intelligence

Many business workflows require extracting important fields from form-like documents (e.g. bank statements, bills of lading, purchase orders, etc.). Recent techniques for automating this task work well only when trained with large datasets. In this work we propose a novel data augmentation technique to improve performance when training data is scarce, e.g. 10-250 documents. Our technique, which we call FieldSwap, works by swapping out the key phrases of a source field with the key phrases of a target field to generate new synthetic examples of the target field for use in training. We demonstrate that this approach can yield 1-7 F1 point improvements in extraction performance.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Dec-22-2022

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States
  - Montana > Roosevelt County (0.46)

Genre:
- Research Report (0.50)

Industry:
- Energy > Oil & Gas > Upstream (0.56)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found