We developed a system for finding address blocks on mail pieces that can process four images per second. Besides locating the address block, our system also determines the writing style, handwritten or machine printed, and moreover, it measures the skew angle of the text lines and cleans noisy images. A layout analysis of all the elements present in the image is performed in order to distinguish drawings and dirt from text and to separate text of advertisement from that of the destination address. A speed of more than four images per second is obtained on a modular hardware platform, containing a board with two of the NET32K neural net chips, a SPARC2 processor board, and a board with 2 digital signal processors. The system has been tested with more than 100,000 images. Its performance depends on the quality of the images, and lies between 85% correct location in very noisy images to over 98% in cleaner images.
In Italy, 120 high school students helped solve a centuries-old problem: how to give researchers access to the Vatican Secret Archives, a massive collection of documents detailing the Vatican's activities as far back as the eighth century. That should look pretty great on their college applications. The shelves of the Vatican Secret Archives are about 85 kilometers (53 miles) long and house 35,000 volumes of catalogues. But the documents that researchers have scanned and uploaded take up less than an inch. That's because the Vatican seems to not have wanted to share the information.
This paper presents a recognition system for handwritten Pashto letters. However, handwritten character recognition is a challenging task. These letters not only differ in shape and style but also vary among individuals. The recognition becomes further daunting due to the lack of standard datasets for inscribed Pashto letters. In this work, we have designed a database of moderate size, which encompasses a total of 4488 images, stemming from 102 distinguishing samples for each of the 44 letters in Pashto. The recognition framework uses zoning feature extractor followed by K-Nearest Neighbour (KNN) and Neural Network (NN) classifiers for classifying individual letter. Based on the evaluation of the proposed system, an overall classification accuracy of approximately 70.05% is achieved by using KNN while 72% is achieved by using NN.
Amharic is the official language of the Federal Democratic Republic of Ethiopia. There are lots of historic Amharic and Ethiopic handwritten documents addressing various relevant issues including governance, science, religious, social rules, cultures and art works which are very reach indigenous knowledge. The Amharic language has its own alphabet derived from Ge'ez which is currently the liturgical language in Ethiopia. Handwritten character recognition for non Latin scripts like Amharic is not addressed especially using the advantages of the state of the art techniques. This research work designs for the first time a model for Amharic handwritten character recognition using a convolutional neural network. The dataset was organized from collected sample handwritten documents and data augmentation was applied for machine learning. The model was further enhanced using multi-task learning from the relationships of the characters. Promising results are observed from the later model which can further be applied to word prediction.