AITopics | Handwriting Recognition

Collaborating Authors

Handwriting Recognition

Computers extract meaning from static handwritten text by processing an image, including separating characters from background noise. Processing text as it is being written often takes account of pen movement and uses special tablets.

News Overviews Instructional Materials AI-Alerts Classics

Robust and Efficient Writer-Independent IMU-Based Handwriting Recognization

Li, Jindong, Hamann, Tim, Barth, Jens, Kaempf, Peter, Zanca, Dario, Eskofier, Bjoern

arXiv.org Artificial IntelligenceFeb-28-2025

Online handwriting recognition (HWR) using data from inertial measurement units (IMUs) remains challenging due to variations in writing styles and the limited availability of high-quality annotated datasets. Traditional models often struggle to recognize handwriting from unseen writers, making writer-independent (WI) recognition a crucial but difficult problem. This paper presents an HWR model with an encoder-decoder structure for IMU data, featuring a CNN-based encoder for feature extraction and a BiLSTM decoder for sequence modeling, which supports inputs of varying lengths. Our approach demonstrates strong robustness and data efficiency, outperforming existing methods on WI datasets, including the WI split of the OnHW dataset and our own dataset. Extensive evaluations show that our model maintains high accuracy across different age groups and writing conditions while effectively learning from limited data. Through comprehensive ablation studies, we analyze key design choices, achieving a balance between accuracy and efficiency. These findings contribute to the development of more adaptable and scalable HWR systems for real-world applications.

artificial intelligence, handwriting recognition, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.20954

Country:

Europe > Germany (0.14)
North America > United States (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (0.89)

Add feedback

On the Generalization of Handwritten Text Recognition Models

Garrido-Munoz, Carlos, Calvo-Zaragoza, Jorge

arXiv.org Artificial IntelligenceNov-26-2024

Recent advances in Handwritten Text Recognition (HTR) have led to significant reductions in transcription errors on standard benchmarks under the i.i.d. assumption, thus focusing on minimizing in-distribution (ID) errors. However, this assumption does not hold in real-world applications, which has motivated HTR research to explore Transfer Learning and Domain Adaptation techniques. In this work, we investigate the unaddressed limitations of HTR models in generalizing to out-of-distribution (OOD) data. We adopt the challenging setting of Domain Generalization, where models are expected to generalize to OOD data without any prior access. To this end, we analyze 336 OOD cases from eight state-of-the-art HTR models across seven widely used datasets, spanning five languages. Additionally, we study how HTR models leverage synthetic data to generalize. We reveal that the most significant factor for generalization lies in the textual divergence between domains, followed by visual divergence. We demonstrate that the error of HTR models in OOD scenarios can be reliably estimated, with discrepancies falling below 10 points in 70\% of cases. We identify the underlying limitations of HTR models, laying the foundation for future research to address this challenge.

artificial intelligence, machine learning, pattern recognition, (19 more...)

arXiv.org Artificial Intelligence

2411.17332

Country:

North America > United States > California (0.14)
Europe > Middle East > Malta (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Text Recognition (0.63)

Add feedback

Integrating Canonical Neural Units and Multi-Scale Training for Handwritten Text Recognition

Wang, Zi-Rui

arXiv.org Artificial IntelligenceOct-23-2024

The segmentation-free research efforts for addressing handwritten text recognition can be divided into three categories: connectionist temporal classification (CTC), hidden Markov model and encoder-decoder methods. In this paper, inspired by the above three modeling methods, we propose a new recognition network by using a novel three-dimensional (3D) attention module and global-local context information. Based on the feature maps of the last convolutional layer, a series of 3D blocks with different resolutions are split. Then, these 3D blocks are fed into the 3D attention module to generate sequential visual features. Finally, by integrating the visual features and the corresponding global-local context features, a well-designed representation can be obtained. Main canonical neural units including attention mechanisms, fully-connected layer, recurrent unit and convolutional layer are efficiently organized into a network and can be jointly trained by the CTC loss and the cross-entropy loss. Experiments on the latest Chinese handwritten text datasets (the SCUT-HCCDoc and the SCUT-EPT) and one English handwritten text dataset (the IAM) show that the proposed method can make a new milestone.

machine learning, pattern recognition, recognition, (18 more...)

arXiv.org Artificial Intelligence

2410.18374

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Text Recognition (0.66)

Add feedback

HATFormer: Historic Handwritten Arabic Text Recognition with Transformers

Chan, Adrian, Mijar, Anupam, Saeed, Mehreen, Wong, Chau-Wai, Khater, Akram

arXiv.org Artificial IntelligenceOct-2-2024

Arabic handwritten text recognition (HTR) is challenging, especially for historical texts, due to diverse writing styles and the intrinsic features of Arabic script. Additionally, Arabic handwriting datasets are smaller compared to English ones, making it difficult to train generalizable Arabic HTR models. To address these challenges, we propose HATFormer, a transformer-based encoder-decoder architecture that builds on a state-of-the-art English HTR model. By leveraging the transformer's attention mechanism, HATFormer captures spatial contextual information to address the intrinsic challenges of Arabic script through differentiating cursive characters, decomposing visual representations, and identifying diacritics. Our customization to historical handwritten Arabic includes an image processor for effective ViT information preprocessing, a text tokenizer for compact Arabic text representation, and a training pipeline that accounts for a limited amount of historic Arabic handwriting data. HATFormer achieves a character error rate (CER) of 8.6% on the largest public historical handwritten Arabic dataset, with a 51% improvement over the best baseline in the literature. HATFormer also attains a comparable CER of 4.2% on the largest private non-historical dataset. Our work demonstrates the feasibility of adapting an English HTR method to a low-resource language with complex, language-specific challenges, contributing to advancements in document digitization, information retrieval, and cultural preservation.

machine learning, natural language, pattern recognition, (20 more...)

arXiv.org Artificial Intelligence

2410.02179

Country: Africa (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (0.88)

Add feedback

Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition

Bhatia, Gagan, Nagoudi, El Moatez Billah, Alwajih, Fakhraddin, Abdul-Mageed, Muhammad

arXiv.org Artificial IntelligenceJul-18-2024

Arabic Optical Character Recognition (OCR) and Handwriting Recognition (HWR) pose unique challenges due to the cursive and context-sensitive nature of the Arabic script. This study introduces Qalam, a novel foundation model designed for Arabic OCR and HWR, built on a SwinV2 encoder and RoBERTa decoder architecture. Our model significantly outperforms existing methods, achieving a Word Error Rate (WER) of just 0.80% in HWR tasks and 1.18% in OCR tasks. We train Qalam on a diverse dataset, including over 4.5 million images from Arabic manuscripts and a synthetic dataset comprising 60k image-text pairs. Notably, Qalam demonstrates exceptional handling of Arabic diacritics, a critical feature in Arabic scripts. Furthermore, it shows a remarkable ability to process high-resolution inputs, addressing a common limitation in current OCR systems. These advancements underscore Qalam's potential as a leading solution for Arabic script recognition, offering a significant leap in accuracy and efficiency.

large language model, machine learning, pattern recognition, (17 more...)

arXiv.org Artificial Intelligence

2407.13559

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

MathWriting: A Dataset For Handwritten Mathematical Expression Recognition

Gervais, Philippe, Fadeeva, Asya, Maksai, Andrii

arXiv.org Artificial IntelligenceApr-16-2024

Online text recognition models have improved a lot in the past few years, because of improvements in model structure and also because of bigger datasets. Mathematical expression (ME) recognition is a more complex task that has not received as much attention. However, the problem is different from text recognition in a number of interesting ways which can prevent improvements on one transfering to the other. Though MEs share with text most of their symbols, they follow a more rigid structure which is also two-dimensional. Where text can be treated to some extent as a one-dimensional problem amenable to sequence modeling, MEs cannot, because the relative position of symbols in space is meaningful.

machine learning, natural language, pattern recognition, (17 more...)

arXiv.org Artificial Intelligence

2404.1069

Country:

North America > United States (0.14)
Asia > China (0.14)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.54)
(2 more...)

Add feedback

Representing Online Handwriting for Recognition in Large Vision-Language Models

Fadeeva, Anastasiia, Schlattner, Philippe, Maksai, Andrii, Collier, Mark, Kokiopoulou, Efi, Berent, Jesse, Musat, Claudiu

arXiv.org Artificial IntelligenceFeb-23-2024

The adoption of tablets with touchscreens and styluses is increasing, and a key feature is converting handwriting to text, enabling search, indexing, and AI assistance. Meanwhile, vision-language models (VLMs) are now the go-to solution for image understanding, thanks to both their state-of-the-art performance across a variety of tasks and the simplicity of a unified approach to training, fine-tuning, and inference. While VLMs obtain high performance on image-based tasks, they perform poorly on handwriting recognition when applied naively, i.e., by rendering handwriting as an image and performing optical character recognition (OCR). In this paper, we study online handwriting recognition with VLMs, going beyond naive OCR. We propose a novel tokenized representation of digital ink (online handwriting) that includes both a time-ordered sequence of strokes as text, and as image. We show that this representation yields results comparable to or better than state-of-the-art online handwriting recognizers. Wide applicability is shown through results with two different VLM families, on multiple public datasets. Our approach can be applied to off-the-shelf VLMs, does not require any changes in their architecture, and can be used in both fine-tuning and parameter-efficient tuning. We perform a detailed ablation study to identify the key elements of the proposed representation.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.15307

Country:

North America > United States (0.14)
Europe > Italy (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Add feedback

VATr++: Choose Your Words Wisely for Handwritten Text Generation

Vanherle, Bram, Pippi, Vittorio, Cascianelli, Silvia, Michiels, Nick, Van Reeth, Frank, Cucchiara, Rita

arXiv.org Artificial IntelligenceFeb-16-2024

Styled Handwritten Text Generation (HTG) has received significant attention in recent years, propelled by the success of learning-based solutions employing GANs, Transformers, and, preliminarily, Diffusion Models. Despite this surge in interest, there remains a critical yet understudied aspect - the impact of the input, both visual and textual, on the HTG model training and its subsequent influence on performance. This study delves deeper into a cutting-edge Styled-HTG approach, proposing strategies for input preparation and training regularization that allow the model to achieve better performance and generalize better. These aspects are validated through extensive analysis on several different settings and datasets. Moreover, in this work, we go beyond performance optimization and address a significant hurdle in HTG research - the lack of a standardized evaluation protocol. In particular, we propose a standardization of the evaluation protocol for HTG and conduct a comprehensive benchmarking of existing approaches. By doing so, we aim to establish a foundation for fair and meaningful comparisons between HTG strategies, fostering progress in the field.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2402.10798

Country: Europe > Italy > Emilia-Romagna (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (0.85)

Add feedback

InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and Write

Mitrevski, Blagoj, Rak, Arina, Schnitzler, Julian, Li, Chengkun, Maksai, Andrii, Berent, Jesse, Musat, Claudiu

arXiv.org Artificial IntelligenceFeb-8-2024

Digital note-taking is gaining popularity, offering a durable, editable, and easily indexable way of storing notes in the vectorized form, known as digital ink. However, a substantial gap remains between this way of note-taking and traditional pen-and-paper note-taking, a practice still favored by a vast majority. Our work, InkSight, aims to bridge the gap by empowering physical note-takers to effortlessly convert their work (offline handwriting) to digital ink (online handwriting), a process we refer to as Derendering. Prior research on the topic has focused on the geometric properties of images, resulting in limited generalization beyond their training domains. Our approach combines reading and writing priors, allowing training a model in the absence of large amounts of paired samples, which are difficult to obtain. To our knowledge, this is the first work that effectively derenders handwritten text in arbitrary photos with diverse visual characteristics and backgrounds. Furthermore, it generalizes beyond its training domain into simple sketches. Our human evaluation reveals that 87% of the samples produced by our model on the challenging HierText dataset are considered as a valid tracing of the input image and 67% look like a pen trajectory traced by a human.

artificial intelligence, handwriting recognition, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2402.05804

Country: Europe > Switzerland (0.14)

Genre:

Research Report (1.00)
Instructional Material > Online (0.34)
Instructional Material > Course Syllabus & Notes (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Arabic Handwritten Text Line Dataset

Bouchal, Hakim, Belaid, Ahror

arXiv.org Artificial IntelligenceDec-10-2023

Segmentation of Arabic manuscripts into lines of text and words is an important step to make recognition systems more efficient and accurate. The problem of segmentation into text lines is solved since there are carefully annotated dataset dedicated to this task. However, To the best of our knowledge, there are no dataset annotating the word position of Arabic texts. In this paper, we present a new dataset specifically designed for historical Arabic script in which we annotate position in word level.

artificial intelligence, dataset, handwriting recognition, (15 more...)

arXiv.org Artificial Intelligence

2312.07573

Country:

North America > United States (0.29)
Africa > Middle East > Algeria (0.22)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (0.52)

Add feedback