Goto

Collaborating Authors

OCR for financial documents

#artificialintelligence

If you have a relative working in the banking industry, ask the person what annoys him/her most about the job. You will surely receive an answer that is related to the task of data entry i.e. the practice of manually entering serial numbers and names from financial documents into the bank's database. It might interest your relative to know that the entire process of data entry can be automated. HOW! he or she might ask. Nanonets supports data extraction from all major financial documents.


Computer Vision: Python OCR & Object Detection Quick Starter

#artificialintelligence

This is the third course from my Computer Vision series. Image Recognition, Object Detection, Object Recognition and also Optical Character Recognition are among the most used applications of Computer Vision. Using these techniques, the computer will be able to recognize and classify either the whole image, or multiple objects inside a single image predicting the class of the objects with the percentage accuracy score. Using OCR, it can also recognize and convert text in the images to machine readable format like text or a document. Object Detection and Object Recognition is widely used in many simple applications and also complex ones like self driving cars.


Computer Vision: Python OCR & Object Detection Quick Starter

#artificialintelligence

This is the third course from my Computer Vision series. Image Recognition, Object Detection, Object Recognition and also Optical Character Recognition are among the most used applications of Computer Vision. Using these techniques, the computer will be able to recognize and classify either the whole image, or multiple objects inside a single image predicting the class of the objects with the percentage accuracy score. Using OCR, it can also recognize and convert text in the images to machine readable format like text or a document. Object Detection and Object Recognition is widely used in many simple applications and also complex ones like self driving cars.


Multi-label classification of promotions in digital leaflets using textual and visual information

arXiv.org Artificial Intelligence

Product descriptions in e-commerce platforms contain detailed and valuable information about retailers assortment. In particular, coding promotions within digital leaflets are of great interest in e-commerce as they capture the attention of consumers by showing regular promotions for different products. However, this information is embedded into images, making it difficult to extract and process for downstream tasks. In this paper, we present an end-to-end approach that classifies promotions within digital leaflets into their corresponding product categories using both visual and textual information. Our approach can be divided into three key components: 1) region detection, 2) text recognition and 3) text classification. In many cases, a single promotion refers to multiple product categories, so we introduce a multi-label objective in the classification head. We demonstrate the effectiveness of our approach for two separated tasks: 1) image-based detection of the descriptions for each individual promotion and 2) multi-label classification of the product categories using the text from the product descriptions. We train and evaluate our models using a private dataset composed of images from digital leaflets obtained by Nielsen. Results show that we consistently outperform the proposed baseline by a large margin in all the experiments.


Detection and Recognition of Text Embedded in Online Images via Neural Context Models

AAAI Conferences

We address the problem of detecting and recognizing the text embedded in online images that are circulated over the Web. Our idea is to leverage context information for both text detection and recognition. For detection, we use local image context around the text region, based on that the text often sequentially appear in online images. For recognition, we exploit the metadata associated with the input online image, including tags, comments, and title, which are used as a topic prior for the word candidates in the image. To infuse such two sets of context information, we propose a contextual text spotting network (CTSN). We perform comparative evaluation with five state-of-the-art text spotting methods on newly collected Instagram and Flickr datasets. We show that our approach that benefits from context information is more successful for text spotting in online images.