Goto

Collaborating Authors

 Text Recognition


Text Recognition for Video in Microsoft Video Indexer

#artificialintelligence

In Video Indexer, we have the capability for recognizing display text in videos. This blog explains some of the techniques we used to extract the best quality data. To start, take a look at the sequence of frames below. Did you manage to recognize the text in the images? It is highly reasonable that you did, without even noticing.


SqueezedText: A Real-Time Scene Text Recognition by Binary Convolutional Encoder-Decoder Network

AAAI Conferences

A new approach for real-time scene text recognition is proposed in this paper. A novel binary convolutional encoder-decoder network (B-CEDNet) together with a bidirectional recurrent neural network (Bi-RNN). The B-CEDNet is engaged as a visual front-end to provide elaborated character detection, and a back-end Bi-RNN performs character-level sequential correction and classification based on learned contextual knowledge. The front-end B-CEDNet can process multiple regions containing characters using a one-off forward operation, and is trained under binary constraints with significant compression. Hence it leads to both remarkable inference run-time speedup as well as memory usage reduction. With the elaborated character detection, the back-end Bi-RNN merely processes a low dimension feature sequence with category and spatial information of extracted characters for sequence correction and classification. By training with over 1,000,000 synthetic scene text images, the B-CEDNet achieves a recall rate of 0.86, precision of 0.88 and F-score of 0.87 on ICDAR-03 and ICDAR-13. With the correction and classification by Bi-RNN, the proposed real-time scene text recognition achieves state-of-the-art accuracy while only consumes less than 1-ms inference run-time. The flow processing flow is realized on GPU with a small network size of 1.01 MB for B-CEDNet and 3.23 MB for Bi-RNN, which is much faster and smaller than the existing solutions.


Text detection API showdown: Google vision vs Microsoft Vs Amazon

#artificialintelligence

Detecting and reading text from photos has multiple use cases, be it clicking a picture of a printed text and automatically converting it into a digital file or the new age application of reading bills and invoices.


Generative Shape Models: Joint Text Recognition and Segmentation with Very Little Training Data

Neural Information Processing Systems

Abstract: We demonstrate that a generative model for object shapes can achieve state of the art results on challenging scene text recognition tasks, and with orders ofmagnitude fewer training images than required for competing discriminative methods.In addition to transcribing text from challenging images, our method performs fine-grained instance segmentation of characters. We show that our model is more robust to both affine transformations and non-affine deformations comparedto previous approaches.