AITopics | layoutlm

Collaborating Authors

layoutlm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improving Applicability of Deep Learning based Token Classification models during Training

Mehra, Anket, Prieß, Malte, Himstedt, Marian

arXiv.org Artificial IntelligenceMar-28-2025

This paper shows that further evaluation metrics during model training are needed to decide about its applicability in inference. As an example, a LayoutLM-based model is trained for token classification in documents. The documents are German receipts. We show that conventional classification metrics, represented by the F1-Score in our experiments, are insufficient for evaluating the applicability of machine learning models in practice. To address this problem, we introduce a novel metric, Document Integrity Precision (DIP), as a solution for visual document understanding and the token classification task. To the best of our knowledge, nothing comparable has been introduced in this context. DIP is a rigorous metric, describing how many documents of the test dataset require manual interventions. It enables AI researchers and software developers to conduct an in-depth investigation of the level of process automation in business software. In order to validate DIP, we conduct experiments with our created models to highlight and analyze the impact and relevance of DIP to evaluate if the model should be deployed or not in different training settings. Our results demonstrate that existing metrics barely change for isolated model impairments, whereas DIP indicates that the model requires substantial human interventions in deployment. The larger the set of entities being predicted, the less sensitive conventional metrics are, entailing poor automation quality. DIP, in contrast, remains a single value to be interpreted for entire entity sets. This highlights the importance of having metrics that focus on the business task for model training in production. Since DIP is created for the token classification task, more research is needed to find suitable metrics for other training tasks.

applicability, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2504.01028

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > Kings County > New York City (0.04)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Software (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Training LayoutLM from Scratch for Efficient Named-Entity Recognition in the Insurance Domain

Uthayasooriyar, Benno, Ly, Antoine, Vermet, Franck, Corro, Caio

arXiv.org Artificial IntelligenceDec-12-2024

Generic pre-trained neural networks may struggle to produce good results in specialized domains like finance and insurance. This is due to a domain mismatch between training data and downstream tasks, as in-domain data are often scarce due to privacy constraints. In this work, we compare different pre-training strategies for LayoutLM. We show that using domain-relevant documents improves results on a named-entity recognition (NER) problem using a novel dataset of anonymized insurance-related financial documents called Payslips. Moreover, we show that we can achieve competitive results using a smaller and faster model.

artificial intelligence, information retrieval, natural language, (15 more...)

arXiv.org Artificial Intelligence

2412.09341

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

Improve Academic Query Resolution through BERT-based Question Extraction from Images

Kamal, Nidhi, Yadav, Saurabh, Singh, Jorawar, Avasthi, Aditi

arXiv.org Artificial IntelligenceApr-28-2024

Providing fast and accurate resolution to the student's query is an essential solution provided by Edtech organizations. This is generally provided with a chat-bot like interface to enable students to ask their doubts easily. One preferred format for student queries is images, as it allows students to capture and post questions without typing complex equations and information. However, this format also presents difficulties, as images may contain multiple questions or textual noise that lowers the accuracy of existing single-query answering solutions. In this paper, we propose a method for extracting questions from text or images using a BERT-based deep learning model and compare it to the other rule-based and layout-based methods. Our method aims to improve the accuracy and efficiency of student query resolution in Edtech organizations.

extraction, query, student, (9 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IATMSI60426.2024.10502904

2405.01587

Country:

Asia > India > Karnataka > Bengaluru (0.06)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report (0.50)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

DocumentNet: Bridging the Data Gap in Document Pre-Training

Yu, Lijun, Miao, Jin, Sun, Xiaoyu, Chen, Jiayi, Hauptmann, Alexander G., Dai, Hanjun, Wei, Wei

arXiv.org Artificial IntelligenceOct-26-2023

Document understanding tasks, in particular, Visually-rich Document Entity Retrieval (VDER), have gained significant attention in recent years thanks to their broad applications in enterprise AI. However, publicly available data have been scarce for these tasks due to strict privacy constraints and high annotation costs. To make things worse, the non-overlapping entity spaces from different datasets hinder the knowledge transfer between document types. In this paper, we propose a method to collect massive-scale and weakly labeled data from the web to benefit the training of VDER models. The collected dataset, named DocumentNet, does not depend on specific document types or entity sets, making it universally applicable to all VDER tasks. The current DocumentNet consists of 30M documents spanning nearly 400 document types organized in a four-level ontology. Experiments on a set of broadly adopted VDER tasks show significant improvements when DocumentNet is incorporated into the pre-training for both classic and few-shot learning settings. With the recent emergence of large language models (LLMs), DocumentNet provides a large data source to extend their multi-modal capabilities for VDER.

dataset, documentnet, uniformer, (16 more...)

arXiv.org Artificial Intelligence

2306.08937

Country:

North America > United States > Virginia (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.82)

Industry:

Law (1.00)
Banking & Finance > Real Estate (1.00)
Banking & Finance > Insurance (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.67)

Add feedback

Long-Range Transformer Architectures for Document Understanding

Douzon, Thibault, Duffner, Stefan, Garcia, Christophe, Espinas, Jérémy

arXiv.org Artificial IntelligenceSep-11-2023

Since their release, Transformers have revolutionized many fields from Natural Language Understanding to Computer Vision. Document Understanding (DU) was not left behind with first Transformer based models for DU dating from late 2019. However, the computational complexity of the self-attention operation limits their capabilities to small sequences. In this paper we explore multiple strategies to apply Transformer based models to long multi-page documents. We introduce 2 new multi-modal (text + layout) long-range models for DU. They are based on efficient implementations of Transformers for long sequences. Long-range models can process whole documents at once effectively and are less impaired by the document's length. We compare them to LayoutLM, a classical Transformer adapted for DU and pre-trained on millions of documents. We further propose 2D relative attention bias to guide self-attention towards relevant tokens without harming model efficiency. We observe improvements on multi-page business documents on Information Retrieval for a small performance cost on smaller sequences. Relative 2D attention revealed to be effective on dense text for both normal and long-range models.

layoutcosformer, layoutlm, long-range transformer architecture, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-41501-2_4

2309.05503

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(7 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Page Layout Analysis of Text-heavy Historical Documents: a Comparison of Textual and Visual Approaches

Sven, Najem-Meyer, Matteo, Romanello

arXiv.org Artificial IntelligenceDec-12-2022

Page layout analysis is a fundamental step in document processing which enables to segment a page into regions of interest. With highly complex layouts and mixed scripts, scholarly commentaries are text-heavy documents which remain challenging for state-of-the-art models. Their layout considerably varies across editions and their most important regions are mainly defined by semantic rather than graphical characteristics such as position or appearance. This setting calls for a comparison between textual, visual and hybrid approaches. We therefore assess the performances of two transformers (LayoutLMv3 and RoBERTa) and an objection-detection network (YOLOv5). If results show a clear advantage in favor of the latter, we also list several caveats to this finding. In addition to our experiments, we release a dataset of ca. 300 annotated pages sampled from 19th century commentaries.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2212.13924

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
North America > United States > New York > New York County > New York City (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(4 more...)

Genre: Research Report > New Finding (0.66)

Industry: Media > Publishing (0.62)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Understanding Long Documents with Different Position-Aware Attentions

Pham, Hai, Wang, Guoxin, Lu, Yijuan, Florencio, Dinei, Zhang, Cha

arXiv.org Artificial IntelligenceAug-17-2022

Despite several successes in document understanding, the practical task for long document understanding is largely under-explored due to several challenges in computation and how to efficiently absorb long multimodal input. Most current transformer-based approaches only deal with short documents and employ solely textual information for attention due to its prohibitive computation and memory limit. To address those issues in long document understanding, we explore different approaches in handling 1D and new 2D position-aware attention with essentially shortened context. Experimental results show that our proposed models have advantages for this task based on various evaluation metrics. Furthermore, our model makes changes only to the attention and thus can be easily adapted to any transformer-based architecture.

dataset, information, long document, (13 more...)

arXiv.org Artificial Intelligence

2208.08201

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

Information Extraction from Visually Rich Documents with Font Style Embeddings

Oussaid, Ismail, Vanhuffel, William, Ratnamogan, Pirashanth, Hajaiej, Mhamed, Mathey, Alexis, Gilles, Thomas

arXiv.org Artificial IntelligenceAug-12-2022

Information extraction (IE) from documents is an intensive area of research with a large set of industrial applications. Current state-of-the-art methods focus on scanned documents with approaches combining computer vision, natural language processing and layout representation. We propose to challenge the usage of computer vision in the case where both token style and visual representation are available (i.e native PDF documents). Our experiments on three real-world complex datasets demonstrate that using token style attributes based embedding instead of a raw visual embedding in LayoutLM model is beneficial. Depending on the dataset, such an embedding yields an improvement of 0.18% to 2.29% in the weighted F1-score with a decrease of 30.7% in the final number of trainable parameters of the model, leading to an improvement in both efficiency and effectiveness.

dataset, extraction, information, (12 more...)

arXiv.org Artificial Intelligence

2111.04045

Country: North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)

Genre: Research Report > Promising Solution (0.66)

Industry: Banking & Finance (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.75)
Information Technology > Data Science > Data Mining > Text Mining (0.65)

Add feedback

Value Retrieval with Arbitrary Queries for Form-like Documents

Gao, Mingfei, Xue, Le, Ramaiah, Chetan, Xing, Chen, Xu, Ran, Xiong, Caiming

arXiv.org Artificial IntelligenceDec-14-2021

We propose value retrieval with arbitrary queries for form-like documents to reduce human effort of processing forms. Unlike previous methods that only address a fixed set of field items, our method predicts target value for an arbitrary query based on the understanding of layout and semantics of a form. To further boost model performance, we propose a simple document language modeling (simpleDLM) strategy to improve document understanding on large-scale model pre-training. Experimental results show that our method outperforms our baselines significantly and the simpleDLM further improves our performance on value retrieval by around 17\% F1 score compared with the state-of-the-art pre-training method. Code will be made publicly available.

baseline, ocr word, query, (13 more...)

arXiv.org Artificial Intelligence

2112.0782

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.36)

Add feedback

Filters

Collaborating Authors

layoutlm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Improving Applicability of Deep Learning based Token Classification models during Training

Training LayoutLM from Scratch for Efficient Named-Entity Recognition in the Insurance Domain

Improve Academic Query Resolution through BERT-based Question Extraction from Images

DocumentNet: Bridging the Data Gap in Document Pre-Training

Long-Range Transformer Architectures for Document Understanding

Page Layout Analysis of Text-heavy Historical Documents: a Comparison of Textual and Visual Approaches

Top 5 Interview Questions on Multi-modal Transformers

Understanding Long Documents with Different Position-Aware Attentions

Information Extraction from Visually Rich Documents with Font Style Embeddings

Value Retrieval with Arbitrary Queries for Form-like Documents