Text Recognition
Hamming OCR: A Locality Sensitive Hashing Neural Network for Scene Text Recognition
Li, Bingcong, Tang, Xin, Qi, Xianbiao, Chen, Yihao, Xiao, Rong
Recently, inspired by Transformer, self-attention-based scene text recognition approaches have achieved outstanding performance. However, we find that the size of model expands rapidly with the lexicon increasing. Specifically, the number of parameters for softmax classification layer and output embedding layer are proportional to the vocabulary size. It hinders the development of a lightweight text recognition model especially applied for Chinese and multiple languages. Thus, we propose a lightweight scene text recognition model named Hamming OCR. In this model, a novel Hamming classifier, which adopts locality sensitive hashing (LSH) algorithm to encode each character, is proposed to replace the softmax regression and the generated LSH code is directly employed to replace the output embedding. We also present a simplified transformer decoder to reduce the number of parameters by removing the feed-forward network and using cross-layer parameter sharing technique. Compared with traditional methods, the number of parameters in both classification and embedding layers is independent on the size of vocabulary, which significantly reduces the storage requirement without loss of accuracy. Experimental results on several datasets, including four public benchmaks and a Chinese text dataset synthesized by SynthText with more than 20,000 characters, shows that Hamming OCR achieves competitive results.
Text Recognition in Flutter Using Firebase's ML Kit
Firebase's ML Kit enables you to you can recognize text in any Latin-based language. It can also detect multiple languages in a single image. Implementing text recognition into your application can automate tedious data entry tasks for receipts, credit cards, business cards -- just to mention a few. The first step involves adding Firebase to your Flutter project. This is done by creating a Firebase project and registering your app.
Generative Shape Models: Joint Text Recognition and Segmentation with Very Little Training Data
Lou, Xinghua, Kansky, Ken, Lehrach, Wolfgang, Laan, CC, Marthi, Bhaskara, Phoenix, D., George, Dileep
We demonstrate that a generative model for object shapes can achieve state of the art results on challenging scene text recognition tasks, and with orders of magnitude fewer training images than required for competing discriminative methods. In addition to transcribing text from challenging images, our method performs fine-grained instance segmentation of characters. We show that our model is more robust to both affine transformations and non-affine deformations compared to previous approaches. Papers published at the Neural Information Processing Systems Conference.
Build a Handwritten Text Recognition System using TensorFlow
Offline Handwritten Text Recognition (HTR) systems transcribe text contained in scanned images into digital text, an example is shown in Figure 1. We will build a Neural Network (NN) which is trained on word-images from the IAM dataset. As the input layer (and therefore also all the other layers) can be kept small for word-images, NN-training is feasible on the CPU (of course, a GPU would be better). This implementation is the bare minimum that is needed for HTR using TF. We use a NN for our task.
Machine Learning with Python: NLP and Text Recognition
Student and freelance AI / Big Data Developer with a passion for full stack. In this article, I apply a series of natural language processing techniques on a dataset containing reviews about businesses. After that, I train a model using Logistic Regression to forecast if a review is "positive" or "negative". The natural language processing field contains a series of tools that are very useful to extract, label, and forecast information starting from raw text data. This collection of techniques are mainly used in the field of emotions recognition, text tagging (for example to automatize the process of sorting complaints from a client), chatbots, and vocal assistants.
ML Kit Android: Implementing Text Recognition -- Firebase
Firebase is now set up, we can now start building our Text Recognition app. We need Firebase ML Vision dependency, we add it in our app-level build.grade After capturing the image from the camera, we'll set the image into the ImageView as: Our app is ready to use. Run the app and click on the camera icon to launch the camera on your Android Device. Click a picture of some text, then click on tick icon and watch Firebase do the magic for you.
Dropbox text recognition makes it easier to find images and PDFs
There's nothing worse than having to pore over a pile of PDFs containing documents scanned as images when you quickly have to find a specific file. Dropbox is making it easier to do that by introducing automatic image recognition, which extracts texts from photos and PDFs and makes them searchable. According to the cloud storage provider, there are 20 billion image and PDF files stored on Dropbox. Around 10 to 20 percent of those are photos of documents, so the new feature can be very, very useful. To look for a specific photo or PDF, you simply have to type in a keyword or phrase like you would on a search engine.
Facebook is making AI that can identify offensive memes
Facebook's moderators can't possibly look through every single image that gets posted on the enormous platform, so Facebook is building AI to help them out. In a blog post today, Facebook describes a system it's built called Rosetta that uses machine learning to identify text in images and videos and then transcribe it into something that's machine readable. In particular, Facebook is finding this tool helpful for transcribing the text on memes. Text transcription tools are nothing new, but Facebook faces different challenges because of the size of its platform and the variety of the images it sees. Rosetta is said to be live now, extracting text from 1 billion images and video frames per day across both Facebook and Instagram.
Double Supervised Network with Attention Mechanism for Scene Text Recognition
Gao, Yuting, Huang, Zheng, Dai, Yuchen
In this paper, we propose Double Supervised Network with Attention Mechanism (DSAN), a novel end-to-end trainable framework for scene text recognition. It incorporates one text attention module during feature extraction which enforces the model to focus on text regions and the whole framework is supervised by two branches. One supervision branch comes from context-level modelling and another comes from one extra supervision enhancement branch which aims at tackling inexplicit semantic information at character level. These two supervisions can benefit each other and yield better performance. The proposed approach can recognize text in arbitrary length and does not need any predefined lexicon. Our method outperforms the current state-of-the-art methods on three text recognition benchmarks: IIIT5K, ICDAR2013 and SVT reaching accuracy 88.6%, 92.3% and 84.1% respectively which suggests the effectiveness of the proposed method.
STN-OCR: A single Neural Network for Text Detection and Text Recognition
STN-OCR, a single semi-supervised Deep Neural Network(DNN), consist of a spatial transformer network -- which is used to detected text regions in images, and a text recognition network -- which recognizes the textual content of the identified text regions. STN-OCR is an end-to-end scene text recognition system, but it is not easy to train. This model is mostly able to detect text in differently arranged lines of text in images, while also recognizing the content of these words. The overview of the system is shown in Figure 1. Compared with most of the current text recognition systems, which extract all the information from the image at once, STN-OCR behaves more like a human.