Zero-shot Task Transfer for Invoice Extraction via Class-aware QA Ensemble
Damodaran, Prithiviraj, Singh, Prabhkaran, Achankuju, Josemon
–arXiv.org Artificial Intelligence
We present VESPA, an intentionally simple yet novel zero-shot system for layout, locale, and domain agnostic document extraction. In spite of the availability of large corpora of documents, the lack of labeled and validated datasets makes it a challenge to discriminatively train document extraction models for enterprises. We show that this problem can be addressed by simply transferring the information extraction (IE) task to a natural language Question-Answering (QA) task without engineering task-specific architectures. We demonstrate the effectiveness of our system by evaluating on a closed corpus of real-world retail and tax invoices with multiple complex layouts, domains, and geographies. The empirical evaluation shows that our system outperforms 4 prominent commercial invoice solutions that use discriminatively trained models with architectures specifically crafted for invoice extraction. We extracted 6 fields with zero upfront human annotation or training with an Avg. F1 of 87.50.
arXiv.org Artificial Intelligence
Aug-13-2021
- Country:
- Africa > Cameroon
- Gulf of Guinea (0.04)
- Asia
- Japan > Honshū
- Kansai > Kyoto Prefecture > Kyoto (0.04)
- Middle East > Jordan (0.04)
- Japan > Honshū
- Europe > Denmark
- Capital Region > Copenhagen (0.04)
- North America > United States
- District of Columbia > Washington (0.04)
- Louisiana (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.04)
- Texas > Wichita County (0.04)
- Oceania > New Zealand (0.04)
- Africa > Cameroon
- Genre:
- Research Report (0.50)
- Technology: