ExTTNet: A Deep Learning Algorithm for Extracting Table Texts from Invoice Images
–arXiv.org Artificial Intelligence
In this work, product tables in invoices are obtained autonomously via a deep learning model, which is named as ExTTNet. Firstly, text is obtained from invoice images using Optical Character Recognition (OCR) techniques. Tesseract OCR engine [37] is used for this process. Afterwards, the number of existing features is increased by using feature extraction methods to increase the accuracy. Labeling process is done according to whether each text obtained as a result of OCR is a table element or not. In this study, a multilayer artificial neural network model is used. The training has been carried out with an Nvidia RTX 3090 graphics card and taken $162$ minutes. As a result of the training, the F1 score is $0.92$.
arXiv.org Artificial Intelligence
Feb-3-2024
- Country:
- Asia
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Trabzon Province > Trabzon (0.04)
- İzmir Province > İzmir (0.05)
- Russia (0.04)
- Middle East > Republic of Türkiye
- Europe
- Finland > Uusimaa
- Helsinki (0.04)
- France > Auvergne-Rhône-Alpes
- Germany (0.04)
- Italy > Veneto
- Venice (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Russia > Northwestern Federal District
- Leningrad Oblast > Saint Petersburg (0.04)
- Spain > Aragón
- Zaragoza Province > Zaragoza (0.04)
- United Kingdom > England
- Greater London > London (0.04)
- North Yorkshire > Middlesbrough (0.04)
- Somerset > Bath (0.04)
- Finland > Uusimaa
- North America > United States
- New York > New York County > New York City (0.05)
- Asia
- Genre:
- Research Report > New Finding (0.34)
- Technology: