Machine learning techniques have been successfully applied to Chinese character recognition; nonetheless, automatic generation of stylized Chinese handwriting remains a challenge. In this paper, we propose StrokeBank, a novel approach to automating personalized Chinese handwriting generation. We use a semi-supervised algorithm to construct a dictionary of component mappings from a small seeding set. Unlike previous work, our approach does not require human supervision in stroke extraction or knowledge of the structure of Chinese characters. This dictionary is used to generate handwriting that preserves stylistic variations, including cursiveness and spatial layout of strokes. We demonstrate the effectiveness of our model by a survey-based evaluation. The results show that our generated characters are nearly indistinguishable from ground truth handwritings.
There is no commercial character recognition software that supports Thai handwriting. Thai handwritten character recognition is needed to convert handwritten text written on mobile and tablet devices into computer encoded text. We propose a novel method that joins three curve signatures. The first signature is the normalized tangent angle function (TAF), which provides rough classification. The other two novel curve signatures are the relative position matrix (RPM), which is used to compare global curve features, and the straightened tangent angle function (STAF), which is used to compare the tangent angle along the cumulative unsigned curvature domain. In the recognition process, an input curve is extracted for these three signatures and the similarity against each character in the handwriting templates is measured. Then, the similarity scores are weighted and summed for ranking. Our experiment is done on 48 handwriting sample sets (44 Thai consonants appear in each set, and there are 4 sets per handwriting). Our methods yield an accuracy of 94.08% for personal handwriting, and 92.23% for general handwriting.
We introduce Independently Recurrent Long Short-term Memory cells: IndyLSTMs. These differ from regular LSTM cells in that the recurrent weights are not modeled as a full matrix, but as a diagonal matrix, i.e.\ the output and state of each LSTM cell depends on the inputs and its own output/state, as opposed to the input and the outputs/states of all the cells in the layer. The number of parameters per IndyLSTM layer, and thus the number of FLOPS per evaluation, is linear in the number of nodes in the layer, as opposed to quadratic for regular LSTM layers, resulting in potentially both smaller and faster models. We evaluate their performance experimentally by training several models on the popular \iamondb and CASIA online handwriting datasets, as well as on several of our in-house datasets. We show that IndyLSTMs, despite their smaller size, consistently outperform regular LSTMs both in terms of accuracy per parameter, and in best accuracy overall. We attribute this improved performance to the IndyLSTMs being less prone to overfitting.
Manually transcribing large amounts of handwritten data is an arduous process that's bound to be fraught with errors. Automated handwriting recognition can drastically cut down on the time required to transcribe large volumes of text, and also serve as a framework for developing future applications of machine learning. Handwritten character recognition is an ongoing field of research encompassing artificial intelligence, computer vision, and pattern recognition. An algorithm that performs handwriting recognition can acquire and detect characteristics from pictures, touch-screen devices and convert them to a machine-readable form. There are two basic types of handwriting recognition systems – online and offline.