Detecting Latin in Historical Books with Large Language Models: A Multimodal Benchmark

Wu, Yu, Shu, Ke, Fischer, Jonas, Pivovarova, Lidia, Rosson, David, Mäkelä, Eetu, Tolonen, Mikko

Oct-29-2025–arXiv.org Artificial Intelligence

This paper presents a novel task of extracting Latin fragments from mixed-language historical documents with varied layouts. We benchmark and evaluate the performance of large foundation models against a multimodal dataset of 724 annotated pages. The results demonstrate that reliable Latin detection with contemporary models is achievable. Our study provides the first comprehensive analysis of these models' capabilities and limits for this task.

category, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

Oct-29-2025

arXiv.org PDF

Add feedback

Country:
- Europe > France (0.14)

Genre:
- Research Report > New Finding (0.87)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Text Processing (0.93)
  - Machine Learning
    - Neural Networks > Deep Learning (0.69)
    - Performance Analysis > Accuracy (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found