AITopics

2507.18264

Country:

North America > United States (1.00)
Asia (1.00)

Genre: Research Report (0.64)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Giovannini, Simone, Coppini, Fabio, Gemelli, Andrea, Marinai, Simone

BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations

arXiv.org Artificial IntelligenceJan-6-2025

We present a unified dataset for document Question-Answering (QA), which is obtained combining several public datasets related to Document AI and visually rich document understanding (VRDU). Our main contribution is twofold: on the one hand we reformulate existing Document AI tasks, such as Information Extraction (IE), into a Question-Answering task, making it a suitable resource for training and evaluating Large Language Models; on the other hand, we release the OCR of all the documents and include the exact position of the answer to be found in the document image as a bounding box. Using this dataset, we explore the impact of different prompting techniques (that might include bounding box information) on the performance of open-weight models, identifying the most effective approaches for document comprehension.

large language model, machine learning, question answering, (22 more...)

2501.03403

Country:

Europe > United Kingdom > England > Staffordshire > Stoke-on-Trent (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
(4 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceAug-29-2023

Document AI: A Comparative Study of Transformer-Based, Graph-Based Models, and Convolutional Neural Networks For Document Layout Analysis

Kastanas, Sotirios, Tan, Shaomu, He, Yi

Document AI aims to automatically analyze documents by leveraging natural language processing and computer vision techniques. One of the major tasks of Document AI is document layout analysis, which structures document pages by interpreting the content and spatial relationships of layout, image, and text. This task can be image-centric, wherein the aim is to identify and label various regions such as authors and paragraphs, or text-centric, where the focus is on classifying individual words in a document. Although there are increasingly sophisticated methods for improving layout analysis, doubts remain about the extent to which their findings can be generalized to a broader context. Specifically, prior work developed systems based on very different architectures, such as transformer-based, graph-based, and CNNs. However, no work has mentioned the effectiveness of these models in a comparative analysis. Moreover, while language-independent Document AI models capable of knowledge transfer have been developed, it remains to be investigated to what degree they can effectively transfer knowledge. In this study, we aim to fill these gaps by conducting a comparative evaluation of state-of-the-art models in document layout analysis and investigating the potential of cross-lingual layout analysis by utilizing machine translation techniques.

convolutional neural network, document layout analysis, graph-based model, (3 more...)

2308.15517

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceFeb-5-2023, 12:40:20 GMT

AI and Automation: Solving Pain Points in AP/AR

As businesses continue to look for ways to streamline their operations and reduce the risk of errors, many are turning to document AI to automate their accounts payable (AP) and accounts receivable (AR) processes. The use of payment AI and automation, and digitization software can greatly reduce errors and improve the efficiency of financial transactions. According to the Federal Reserve, 75% of bills are manually processed, which is slow, error prone, and leads to frustration from billers and payers. Businesses communicating with other businesses send a lot of PDF documents, however it is difficult to extract and organize important information from PDF documents, especially if they're not structured in a consistent way. This can lead to delays and errors, which can be costly for the business and frustrating for customers.

ai and automation, document ai, streamline, (12 more...)

Industry: Banking & Finance (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.33)

#artificialintelligenceJan-29-2023, 13:15:22 GMT

The Next ChatGPT Revolution: Intelligent Document Processing

ChatGPT, the state-of-the-art language model developed by OpenAI, is poised to have a significant impact on the B2B industry. This powerful technology has the potential to disrupt traditional business processes and open up new opportunities for companies across a wide range of industries when it comes to intelligent document processing. One of the key areas where ChatGPT is likely to have an impact is in automating routine tasks and customer interactions. Another area where ChatGPT is likely to be disruptive is in the generation of written content. This technology can be used to quickly and accurately generate reports, product descriptions, and other written materials.

information, large language model, machine learning, (20 more...)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceJan-9-2023, 18:30:52 GMT

More powerful W2 and payslip processing with Document AI

Documents like payslips and W2s are crucial to processes such as employment and income verification for mortgage loans, personal loans, personal finance, and benefits processing. Unfortunately, efficiently extracting data from these documents at scale can be challenging and time-consuming, with many organizations relying on manual examination of documents or automated approaches that don't adequately capture the document data needed for given tasks. Google Cloud built Document AI to remove these barriers, empowering customers to deploy powerful machine learning models to more quickly process documents, save money, and discover insights. We're excited to expand Document AI's capabilities with the recent release of improved pre-trained models for W2s and payslips, built on Document AI Workbench.

artificial intelligence, document ai, machine learning

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

#artificialintelligenceOct-29-2022, 05:49:03 GMT

La veille de la cybersécurité

Technological breakthroughs have revolutionized the way individuals work and conduct business. For instance, people must develop skills that will enable them to find new jobs because it is predicted that automation could replace up to a third of all jobs by 2030. Consider the following to demonstrate how crucial document AI will be in the future: Did you know that 70% of enterprise documents are free-form text, such as written documents and emails? This indicates that the software used to automatically extract information and decode text from all of your documents has been processed (without human input). As a result, document AI has been made possible via machine learning.

document ai, extract information, veille, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.87)

#artificialintelligenceOct-28-2022, 04:50:11 GMT

What is Document AI? How Machine Learning Powers Some of the Document AI Platforms?

document ai, neural network, platform, (12 more...)

Industry: Information Technology > Services (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceJul-19-2022, 00:15:32 GMT

Why Document AI will be at the forefront of the workplace

In a dwindling labour market, it's harder than ever to retain employees as millions quit their jobs over the past year The Great Resignation, reshuffle or reset – call it whatever you will, we can't erase the fact that it's taking a toll on businesses. Some, say employers failed by often treating workers as dispensable, and therefore couldn't tempt them back once lockdowns lifted. Other employees are tired, even burnt out. The stresses of the pandemic mounted heavily on the way we work and shone a light on the importance of a healthy work-life balance. In our latest research, 91% of UK employees admit they waste up to 8 hours a week searching documents for information to do their jobs.

customer experience, forefront, uk employee, (8 more...)

Technology: Information Technology > Artificial Intelligence (1.00)

arXiv.org Artificial IntelligenceJul-19-2022

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

Huang, Yupan, Lv, Tengchao, Cui, Lei, Lu, Yutong, Wei, Furu

Self-supervised pre-training techniques have achieved remarkable progress in Document AI. Most multimodal pre-trained models use a masked language modeling objective to learn bidirectional representations on the text modality, but they differ in pre-training objectives for the image modality. This discrepancy adds difficulty to multimodal representation learning. In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking. Additionally, LayoutLMv3 is pre-trained with a word-patch alignment objective to learn cross-modal alignment by predicting whether the corresponding image patch of a text word is masked. The simple unified architecture and training objectives make LayoutLMv3 a general-purpose pre-trained model for both text-centric and image-centric Document AI tasks. Experimental results show that LayoutLMv3 achieves state-of-the-art performance not only in text-centric tasks, including form understanding, receipt understanding, and document visual question answering, but also in image-centric tasks such as document image classification and document layout analysis. The code and models are publicly available at \url{https://aka.ms/layoutlmv3}.

layoutlmv3, objective, representation, (15 more...)

2204.08387

Country:

Asia (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.66)