AITopics | doclaynet

Collaborating Authors

doclaynet

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Docling Technical Report

Auer, Christoph, Lysak, Maksym, Nassar, Ahmed, Dolfi, Michele, Livathinos, Nikolaos, Vagenas, Panos, Ramis, Cesar Berrospi, Omenetti, Matteo, Lindlbauer, Fabian, Dinkla, Kasper, Mishra, Lokesh, Kim, Yusik, Gupta, Shubham, de Lima, Rafael Teixeira, Weber, Valery, Morin, Lucas, Meijer, Ingmar, Kuropiatnyk, Viktor, Staar, Peter W. J.

arXiv.org Artificial IntelligenceAug-30-2024

This technical report introduces Docling, an easy to use, self-contained, MIT-licensed open-source package for PDF document conversion. It is powered by state-of-the-art specialized AI models for layout analysis (DocLayNet) and table structure recognition (TableFormer), and runs efficiently on commodity hardware in a small resource budget. The code interface allows for easy extensibility and addition of new features and models.

dataset, doclaynet, docling, (14 more...)

arXiv.org Artificial Intelligence

2408.09869

Country:

Europe > Switzerland > Zürich > Zürich (0.05)
North America > United States > District of Columbia > Washington (0.05)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.40)

Industry:

Transportation (0.70)
Information Technology (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Information Management (0.93)
(2 more...)

Add feedback

A Graphical Approach to Document Layout Analysis

Wang, Jilin, Krumdick, Michael, Tong, Baojia, Halim, Hamima, Sokolov, Maxim, Barda, Vadym, Vendryes, Delphine, Tanner, Chris

arXiv.org Artificial IntelligenceAug-3-2023

Document layout analysis (DLA) is the task of detecting the distinct, semantic content within a document and correctly classifying these items into an appropriate category (e.g., text, title, figure). DLA pipelines enable users to convert documents into structured machine-readable formats that can then be used for many useful downstream tasks. Most existing state-of-the-art (SOTA) DLA models represent documents as images, discarding the rich metadata available in electronically generated PDFs. Directly leveraging this metadata, we represent each PDF page as a structured graph and frame the DLA problem as a graph segmentation and classification problem. We introduce the Graph-based Layout Analysis Model (GLAM), a lightweight graph neural network competitive with SOTA models on two challenging DLA datasets - while being an order of magnitude smaller than existing models. In particular, the 4-million parameter GLAM model outperforms the leading 140M+ parameter computer vision-based model on 5 of the 11 classes on the DocLayNet dataset. A simple ensemble of these two models achieves a new state-of-the-art on DocLayNet, increasing mAP from 76.8 to 80.8. Overall, GLAM is over 5 times more efficient than SOTA models, making GLAM a favorable engineering choice for DLA tasks.

data mining, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2308.02051

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.94)
(2 more...)

Add feedback

DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis

Pfitzmann, Birgit, Auer, Christoph, Dolfi, Michele, Nassar, Ahmed S, Staar, Peter W J

arXiv.org Artificial IntelligenceJun-2-2022

Accurate document layout analysis is a key requirement for high-quality PDF document conversion. With the recent availability of public, large ground-truth datasets such as PubLayNet and DocBank, deep-learning models have proven to be very effective at layout detection and segmentation. While these datasets are of adequate size to train such models, they severely lack in layout variability since they are sourced from scientific article repositories such as PubMed and arXiv only. Consequently, the accuracy of the layout segmentation drops significantly when these models are applied on more challenging and diverse layouts. In this paper, we present \textit{DocLayNet}, a new, publicly available, document-layout annotation dataset in COCO format. It contains 80863 manually annotated pages from diverse data sources to represent a wide variability in layouts. For each PDF page, the layout annotations provide labelled bounding-boxes with a choice of 11 distinct classes. DocLayNet also provides a subset of double- and triple-annotated pages to determine the inter-annotator agreement. In multiple experiments, we provide baseline accuracy scores (in mAP) for a set of popular object detection models. We also demonstrate that these models fall approximately 10\% behind the inter-annotator agreement. Furthermore, we provide evidence that DocLayNet is of sufficient size. Lastly, we compare models trained on PubLayNet, DocBank and DocLayNet, showing that layout predictions of the DocLayNet-trained models are more robust and thus the preferred choice for general-purpose document-layout analysis.

deep learning, document-layout analysis, machine learning, (3 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3534678.3539043

2206.01062

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback