Goto

Collaborating Authors

 document ai


Zero-shot OCR Accuracy of Low-Resourced Languages: A Comparative Analysis on Sinhala and Tamil

Jayatilleke, Nevidu, de Silva, Nisansa

arXiv.org Artificial Intelligence

Solving the problem of Optical Character Recognition (OCR) on printed text for Latin and its derivative scripts can now be considered settled due to the volumes of research done on English and other High-Resourced Languages (HRL). However, for Low-Resourced Languages (LRL) that use unique scripts, it remains an open problem. This study presents a comparative analysis of the zero-shot performance of six distinct OCR engines on two LRLs: Sinhala and Tamil. The selected engines include both commercial and open-source systems, aiming to evaluate the strengths of each category. The Cloud Vision API, Surya, Document AI, and Tesseract were evaluated for both Sinhala and Tamil, while Subasa OCR and EasyOCR were examined for only one language due to their limitations. The performance of these systems was rigorously analysed using five measurement techniques to assess accuracy at both the character and word levels. According to the findings, Surya delivered the best performance for Sinhala across all metrics, with a WER of 2.61%. Conversely, Document AI excelled across all metrics for Tamil, highlighted by a very low CER of 0.78%. In addition to the above analysis, we also introduce a novel synthetic Tamil OCR benchmarking dataset.


BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations

Giovannini, Simone, Coppini, Fabio, Gemelli, Andrea, Marinai, Simone

arXiv.org Artificial Intelligence

We present a unified dataset for document Question-Answering (QA), which is obtained combining several public datasets related to Document AI and visually rich document understanding (VRDU). Our main contribution is twofold: on the one hand we reformulate existing Document AI tasks, such as Information Extraction (IE), into a Question-Answering task, making it a suitable resource for training and evaluating Large Language Models; on the other hand, we release the OCR of all the documents and include the exact position of the answer to be found in the document image as a bounding box. Using this dataset, we explore the impact of different prompting techniques (that might include bounding box information) on the performance of open-weight models, identifying the most effective approaches for document comprehension.


Document AI: A Comparative Study of Transformer-Based, Graph-Based Models, and Convolutional Neural Networks For Document Layout Analysis

Kastanas, Sotirios, Tan, Shaomu, He, Yi

arXiv.org Artificial Intelligence

Document AI aims to automatically analyze documents by leveraging natural language processing and computer vision techniques. One of the major tasks of Document AI is document layout analysis, which structures document pages by interpreting the content and spatial relationships of layout, image, and text. This task can be image-centric, wherein the aim is to identify and label various regions such as authors and paragraphs, or text-centric, where the focus is on classifying individual words in a document. Although there are increasingly sophisticated methods for improving layout analysis, doubts remain about the extent to which their findings can be generalized to a broader context. Specifically, prior work developed systems based on very different architectures, such as transformer-based, graph-based, and CNNs. However, no work has mentioned the effectiveness of these models in a comparative analysis. Moreover, while language-independent Document AI models capable of knowledge transfer have been developed, it remains to be investigated to what degree they can effectively transfer knowledge. In this study, we aim to fill these gaps by conducting a comparative evaluation of state-of-the-art models in document layout analysis and investigating the potential of cross-lingual layout analysis by utilizing machine translation techniques.


AI and Automation: Solving Pain Points in AP/AR

#artificialintelligence

As businesses continue to look for ways to streamline their operations and reduce the risk of errors, many are turning to document AI to automate their accounts payable (AP) and accounts receivable (AR) processes. The use of payment AI and automation, and digitization software can greatly reduce errors and improve the efficiency of financial transactions. According to the Federal Reserve, 75% of bills are manually processed, which is slow, error prone, and leads to frustration from billers and payers. Businesses communicating with other businesses send a lot of PDF documents, however it is difficult to extract and organize important information from PDF documents, especially if they're not structured in a consistent way. This can lead to delays and errors, which can be costly for the business and frustrating for customers.


The Next ChatGPT Revolution: Intelligent Document Processing

#artificialintelligence

ChatGPT, the state-of-the-art language model developed by OpenAI, is poised to have a significant impact on the B2B industry. This powerful technology has the potential to disrupt traditional business processes and open up new opportunities for companies across a wide range of industries when it comes to intelligent document processing. One of the key areas where ChatGPT is likely to have an impact is in automating routine tasks and customer interactions. Another area where ChatGPT is likely to be disruptive is in the generation of written content. This technology can be used to quickly and accurately generate reports, product descriptions, and other written materials.


More powerful W2 and payslip processing with Document AI

#artificialintelligence

Documents like payslips and W2s are crucial to processes such as employment and income verification for mortgage loans, personal loans, personal finance, and benefits processing. Unfortunately, efficiently extracting data from these documents at scale can be challenging and time-consuming, with many organizations relying on manual examination of documents or automated approaches that don't adequately capture the document data needed for given tasks. Google Cloud built Document AI to remove these barriers, empowering customers to deploy powerful machine learning models to more quickly process documents, save money, and discover insights. We're excited to expand Document AI's capabilities with the recent release of improved pre-trained models for W2s and payslips, built on Document AI Workbench.


La veille de la cybersécurité

#artificialintelligence

Technological breakthroughs have revolutionized the way individuals work and conduct business. For instance, people must develop skills that will enable them to find new jobs because it is predicted that automation could replace up to a third of all jobs by 2030. Consider the following to demonstrate how crucial document AI will be in the future: Did you know that 70% of enterprise documents are free-form text, such as written documents and emails? This indicates that the software used to automatically extract information and decode text from all of your documents has been processed (without human input). As a result, document AI has been made possible via machine learning.


What is Document AI? How Machine Learning Powers Some of the Document AI Platforms?

#artificialintelligence

Technological breakthroughs have revolutionized the way individuals work and conduct business. For instance, people must develop skills that will enable them to find new jobs because it is predicted that automation could replace up to a third of all jobs by 2030. Consider the following to demonstrate how crucial document AI will be in the future: Did you know that 70% of enterprise documents are free-form text, such as written documents and emails? This indicates that the software used to automatically extract information and decode text from all of your documents has been processed (without human input). As a result, document AI has been made possible via machine learning. Thanks to these apps, businesses may now understand document-based data and use it for various purposes.


Why Document AI will be at the forefront of the workplace

#artificialintelligence

In a dwindling labour market, it's harder than ever to retain employees as millions quit their jobs over the past year The Great Resignation, reshuffle or reset – call it whatever you will, we can't erase the fact that it's taking a toll on businesses. Some, say employers failed by often treating workers as dispensable, and therefore couldn't tempt them back once lockdowns lifted. Other employees are tired, even burnt out. The stresses of the pandemic mounted heavily on the way we work and shone a light on the importance of a healthy work-life balance. In our latest research, 91% of UK employees admit they waste up to 8 hours a week searching documents for information to do their jobs.


LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

Huang, Yupan, Lv, Tengchao, Cui, Lei, Lu, Yutong, Wei, Furu

arXiv.org Artificial Intelligence

Self-supervised pre-training techniques have achieved remarkable progress in Document AI. Most multimodal pre-trained models use a masked language modeling objective to learn bidirectional representations on the text modality, but they differ in pre-training objectives for the image modality. This discrepancy adds difficulty to multimodal representation learning. In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking. Additionally, LayoutLMv3 is pre-trained with a word-patch alignment objective to learn cross-modal alignment by predicting whether the corresponding image patch of a text word is masked. The simple unified architecture and training objectives make LayoutLMv3 a general-purpose pre-trained model for both text-centric and image-centric Document AI tasks. Experimental results show that LayoutLMv3 achieves state-of-the-art performance not only in text-centric tasks, including form understanding, receipt understanding, and document visual question answering, but also in image-centric tasks such as document image classification and document layout analysis. The code and models are publicly available at \url{https://aka.ms/layoutlmv3}.