Optical Character Recognition
How To Capitalize Words Using AI
Have you ever faced a large corpus of text missing capitalization of words? You required to uppercase thousand of words before publishing the text. In this post, I demonstrate how to repair case information in documents automatically. Truecasing is a natural language processing problem of finding the proper capitalization of words within a text where such information is unavailable. Use cases include transcripts from various audio sources, automatic speech recognition, optical character recognition, medical records, online messaging, and gaming.
Tag, Copy or Predict: A Unified Weakly-Supervised Learning Framework for Visual Information Extraction using Sequences
Wang, Jiapeng, Wang, Tianwei, Tang, Guozhi, Jin, Lianwen, Ma, Weihong, Ding, Kai, Huang, Yichao
Visual information extraction (VIE) has attracted increasing attention in recent years. The existing methods usually first organized optical character recognition (OCR) results into plain texts and then utilized token-level entity annotations as supervision to train a sequence tagging model. However, it expends great annotation costs and may be exposed to label confusion, and the OCR errors will also significantly affect the final performance. In this paper, we propose a unified weakly-supervised learning framework called TCPN (Tag, Copy or Predict Network), which introduces 1) an efficient encoder to simultaneously model the semantic and layout information in 2D OCR results; 2) a weakly-supervised training strategy that utilizes only key information sequences as supervision; and 3) a flexible and switchable decoder which contains two inference modes: one (Copy or Predict Mode) is to output key information sequences of different categories by copying a token from the input or predicting one in each time step, and the other (Tag Mode) is to directly tag the input sequence in a single forward pass. Our method shows new state-of-the-art performance on several public benchmarks, which fully proves its effectiveness.
Best OCR by Text Extraction Accuracy in 2021
Optical Character Recognition (OCR) is a field of machine learning that is specialized in distinguishing characters within images like scanned documents, printed books, or photos. Although it is a mature technology, there are still no OCR products that can recognize all kinds of text 100% accurately. Among the products that we benchmarked, only a few products could output successful results from our test set. OCR tools are used by companies to identify texts and their positions in images, classify business documents according to subjects, or conduct key-value pairing within documents. Based on OCR results, other technology companies build applications like document automation. For all these business cases, accurate text recognition is critical for an OCR product.
AI In Oil And Gas, Unlocking The Value Of Data - AI Summary
Daniel Faggella: So, Lorena, I want to be able to dive into these various use cases of how artificial intelligence can start to unlock the value of data in the oil and gas space, and make this really tangible. I know the first category we wanted to talk about was really around the value of subsurface data, that there's a lot of subsurface data, obviously in the oil and oil and gas domain. Lorena Pelegrín: And we see that AI or our ML can help these teams find the data and process the data much, much faster. Yeah, and I imagine a good deal of this has to do with, tell me if I'm wrong here, Lorena, but having an understanding of your company from working with you guys for a little while, I would imagine that the digitization of these myriad, somewhat chunky paper forms is one part of the process here, using some kind of optical character recognition stuff and working with historical records and maybe congealing and digitizing that. Daniel Faggella: But you let me know, Lorena, where does M&A, where does this data come in, in terms of the real value for potential M&A? Daniel Faggella: So Drone Deploy, for example, was on talking about what they do in the energy space with drones and video data to look at and inspect assets.
Here Are the Top 10 Ted Talks on AI That Are a Must-Watch
In the current scenario, where everything is going digital, Ted Talks have a great role in educating and imparting knowledge to a wider audience. These engaging interactions have robbed the minds of people and Ted Talks do not consume a lot of time. Instead, they just spread ideas in a very concise, interactive form so that it hooks and does not bore the audience. Ted Talks cover a wide variety of themes and topics, technology is one of them. It has a great archive of talks on artificial intelligence.
Text to Speech System for Multi-Speaker Setting
What would you want to do if you could generate the voice of your favorite celebrity? Before I get ahead of myself, let me clearly define the objective of this blog. Given text and some voice clips of the desired speaker (say, Beyonce), I want my AI to output an audio clip where Beyonce is speaking the text that I input to this code. So essentially, this is the same Text To Speech (TTS) problem we saw earlier but with an added constraint to output the speech in a particular speaker's voice. In this blog, I share two methods that can complete our task, and I will be comparing these two methods at the end.
Purchase Order (PO) Matching - Automate with AI
PO Matching is the process of connecting a purchase order (PO) issued by a client indicating types, quantities, and agreed prices for products/services to the invoice issued by a vendor for it's delivery. The goal of PO matching is to ensure timely vendor payments, correct accounting of costs and easy detection of fraudulent practices. PO matching involves several steps, including the receipt of invoice, capture of data, verification with purchase order, matching the parameters, and resolution based on various parameters. Invoice processing and PO matching are complex, time-consuming, and resource-intensive processes when performed manually, especially in scaled-up business activities. Even in departments where there is digitization of information in the form of Enterprise Resource Planning (ERP) applications, a significant amount of human labour is required; from the time an invoice is raised or received to its entry into the ERP application, accounts payable personnel perform a seemingly endless list of chores.
How to detect online trends without web scraping
To get text information from the content of each screenshot, we will apply text recognition from these images. Our goal is not only to obtain the words used on the page but also their weights (understood as a measure of their relevance or importance). Thanks to that, we will be able to generate a word cloud, where word size will signal how exposed a word was on the site. Pytesseract is an optical character recognition (OCR) tool for python. It will recognize and "read" the text embedded in screenshots.
Simple Transparent Adversarial Examples
There has been a rise in the use of Machine Learning as a Service (MLaaS) Vision APIs as they offer multiple services including pre-built models and algorithms, which otherwise take a huge amount of resources if built from scratch. As these APIs get deployed for high-stakes applications, it's very important that they are robust to different manipulations. Recent works have only focused on typical adversarial attacks when evaluating the robustness of vision APIs. We propose two new aspects of adversarial image generation methods and evaluate them on the robustness of Google Cloud Vision API's optical character recognition service and object detection APIs deployed in real-world settings such as sightengine.com, picpurify.com, Google Cloud Vision API, and Microsoft Azure's Computer Vision API. Specifically, we go beyond the conventional small-noise adversarial attacks and introduce secret embedding and transparent adversarial examples as a simpler way to evaluate robustness. These methods are so straightforward that even non-specialists can craft such attacks. As a result, they pose a serious threat where APIs are used for high-stakes applications. Our transparent adversarial examples successfully evade state-of-the art object detections APIs such as Azure Cloud Vision (attack success rate 52%) and Google Cloud Vision (attack success rate 36%). 90% of the images have a secret embedded text that successfully fools the vision of time-limited humans but is detected by Google Cloud Vision API's optical character recognition. Complementing to current research, our results provide simple but unconventional methods on robustness evaluation.