Transcribing Spanish Texts from the Past: Experiments with Transkribus, Tesseract and Granite

Torterolo-Orta, Yanco Amor, Macicior-Mitxelena, Jaione, Miguez-Lamanuzzi, Marina, García-Serrano, Ana

Jul-8-2025–arXiv.org Artificial Intelligence

This article presents the experiments and results obtained by the GRESEL team in the IberLEF 2025 shared task PastReader: Transcribing Texts from the Past. Three types of experiments were conducted with the dual aim of participating in the task and enabling comparisons across different approaches. These included the use of a web-based OCR service, a traditional OCR engine, and a compact multimodal model. All experiments were run on consumer-grade hardware, which, despite lacking high-performance computing capacity, provided sufficient storage and stability. The results, while satisfactory, leave room for further improvement. Future work will focus on exploring new techniques and ideas using the Spanish-language dataset provided by the shared task, in collaboration with Biblioteca Nacional de España (BNE).

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Jul-8-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Spain (0.88)
- Asia > Middle East
  - UAE (0.28)

Genre:
- Research Report (0.82)

Industry:
- Information Technology (0.68)
- Media (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning (0.67)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found