WriteViT: Handwritten Text Generation with Vision Transformer

Nam, Dang Hoai, Khoa, Huynh Tong Dang, Duy, Vo Nguyen Le

arXiv.org Artificial Intelligence 

Humans can quickly generalize handwriting styles from a single example by intuitively separating content from style. Motivated by this gap, we introduce WriteViT, a one-shot handwritten text synthesis framework that incorporates Vision Transformers (ViT), a family of models that have shown strong performance across various computer vision tasks. WriteViT integrates a ViT-based Writer Identifier for extracting style embeddings, a multi-scale generator built with Transformer encoder-decoder blocks enhanced by conditional positional encoding (CPE), and a lightweight ViT-based recognizer. While previous methods typically rely on CNNs or CRNNs, our design leverages transformers in key components to better capture both fine-grained stroke details and higher-level style information. Although handwritten text synthesis has been widely explored, its application to Vietnamese--a language rich in diacritics and complex typography--remains limited. Experiments on Vietnamese and English datasets demonstrate that WriteViT produces high-quality, style-consistent handwriting while maintaining strong recognition performance in low-resource scenarios. Preprint submitted to arXiv May 31, 2025 1. Introduction Despite significant technological advancements, handwritten text continues to play a critical role in various domains, including historical archiving, form processing, and educational assessment. Consequently, handwritten text recognition (HTR) remains a key area of research in document analysis. However, the task poses persistent challenges due to the inherent variability of handwriting.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found