AITopics | Garrido-Munoz, Carlos

Collaborating Authors

Garrido-Munoz, Carlos

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Handwritten Text Recognition: A Survey

Garrido-Munoz, Carlos, Rios-Vila, Antonio, Calvo-Zaragoza, Jorge

arXiv.org Artificial IntelligenceFeb-12-2025

Handwritten Text Recognition (HTR) has become an essential field within pattern recognition and machine learning, with applications spanning historical document preservation to modern data entry and accessibility solutions. The complexity of HTR lies in the high variability of handwriting, which makes it challenging to develop robust recognition systems. This survey examines the evolution of HTR models, tracing their progression from early heuristic-based approaches to contemporary state-of-the-art neural models, which leverage deep learning techniques. The scope of the field has also expanded, with models initially capable of recognizing only word-level content progressing to recent end-to-end document-level approaches. Our paper categorizes existing work into two primary levels of recognition: (1) \emph{up to line-level}, encompassing word and line recognition, and (2) \emph{beyond line-level}, addressing paragraph- and document-level challenges. We provide a unified framework that examines research methodologies, recent advances in benchmarking, key datasets in the field, and a discussion of the results reported in the literature. Finally, we identify pressing research challenges and outline promising future directions, aiming to equip researchers and practitioners with a roadmap for advancing the field.

machine learning, pattern recognition, recognition, (19 more...)

arXiv.org Artificial Intelligence

2502.08417

Country:

Europe > Spain (0.28)
North America > United States > Rhode Island (0.14)
Europe > Middle East > Malta (0.14)

Genre:

Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.68)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.30)

Add feedback

On the Generalization of Handwritten Text Recognition Models

Garrido-Munoz, Carlos, Calvo-Zaragoza, Jorge

arXiv.org Artificial IntelligenceNov-26-2024

Recent advances in Handwritten Text Recognition (HTR) have led to significant reductions in transcription errors on standard benchmarks under the i.i.d. assumption, thus focusing on minimizing in-distribution (ID) errors. However, this assumption does not hold in real-world applications, which has motivated HTR research to explore Transfer Learning and Domain Adaptation techniques. In this work, we investigate the unaddressed limitations of HTR models in generalizing to out-of-distribution (OOD) data. We adopt the challenging setting of Domain Generalization, where models are expected to generalize to OOD data without any prior access. To this end, we analyze 336 OOD cases from eight state-of-the-art HTR models across seven widely used datasets, spanning five languages. Additionally, we study how HTR models leverage synthetic data to generalize. We reveal that the most significant factor for generalization lies in the textual divergence between domains, followed by visual divergence. We demonstrate that the error of HTR models in OOD scenarios can be reliably estimated, with discrepancies falling below 10 points in 70\% of cases. We identify the underlying limitations of HTR models, laying the foundation for future research to address this challenge.

artificial intelligence, machine learning, pattern recognition, (19 more...)

arXiv.org Artificial Intelligence

2411.17332

Country:

North America > United States > California (0.14)
Europe > Middle East > Malta (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Text Recognition (0.63)

Add feedback

Spatial Context-based Self-Supervised Learning for Handwritten Text Recognition

Penarrubia, Carlos, Garrido-Munoz, Carlos, Valero-Mas, Jose J., Calvo-Zaragoza, Jorge

arXiv.org Artificial IntelligenceApr-17-2024

Handwritten text recognition (HTR) is the research area in the field of computer vision whose objective is to transcribe the textual content of a written manuscript into a digital machine-readable format [73]. This field not only plays a key role in the current digital era of handwriting by electronic means (such as tablets) [11], but is also of paramount relevance for the preservation, indexing and dissemination of historical manuscripts that exist solely in a physical format [56]. HTR has developed considerably over the last decade owing to the emergence of Deep Learning [57], which has greatly increased its performance. However, in order to attain competitive results, these solutions usually require large volumes of manually-labelled data, which is the principal bottleneck of this method. One means by which to alleviate this problem, Self-Supervised Learning (SSL), has recently gained considerable attention from the research community [61]. SSL employs what is termed as a pretext task to leverage collections of unlabelled data for the training of neural models in order to obtain descriptive and intelligible representations [8], thus reducing the need for large amounts of labelled data [4]. The pretext tasks can be framed in different categories according to their working principle [34, 61], with the following being some of the main existing families: (i) image generation strategies [63, 46], which focus on recovering the original distribution of the data from defined distortions or corruptions; (ii) contrastive learning methods [60, 33], whose objective is to learn representative and discernible codifications of the data, and (iii) spatial context methods [27, 58], which focus on either estimating geometric transformations performed on the data [27]--i.e.

artificial intelligence, machine learning, recognition, (15 more...)

arXiv.org Artificial Intelligence

2404.11585

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback