Page Layout Analysis of Text-heavy Historical Documents: a Comparison of Textual and Visual Approaches
Sven, Najem-Meyer, Matteo, Romanello
–arXiv.org Artificial Intelligence
Page layout analysis is a fundamental step in document processing which enables to segment a page into regions of interest. With highly complex layouts and mixed scripts, scholarly commentaries are text-heavy documents which remain challenging for state-of-the-art models. Their layout considerably varies across editions and their most important regions are mainly defined by semantic rather than graphical characteristics such as position or appearance. This setting calls for a comparison between textual, visual and hybrid approaches. We therefore assess the performances of two transformers (LayoutLMv3 and RoBERTa) and an objection-detection network (YOLOv5). If results show a clear advantage in favor of the latter, we also list several caveats to this finding. In addition to our experiments, we release a dataset of ca. 300 annotated pages sampled from 19th century commentaries.
arXiv.org Artificial Intelligence
Dec-12-2022
- Country:
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America > United States
- New York > New York County
- New York City (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.04)
- New York > New York County
- Europe
- Switzerland > Vaud
- Lausanne (0.04)
- France > Provence-Alpes-Côte d'Azur
- Alpes-Maritimes > Nice (0.04)
- Belgium > Flanders
- Antwerp Province > Antwerp (0.04)
- Switzerland > Vaud
- Asia > Japan
- Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- Oceania > Australia
- Genre:
- Research Report > New Finding (0.66)
- Industry:
- Media > Publishing (0.62)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Natural Language (1.00)
- Machine Learning (1.00)
- Information Technology > Artificial Intelligence