AITopics | Optical Character Recognition

Collaborating Authors

Optical Character Recognition

Our second example deals with a more challenging problem: the recognition of hand-printed letters of the alphabet. The characters that people print in the ordinary course of filling out forms and questionnaires are surprisingly varied. Gaps abound wherecontinuous lines might be expected; curves and sharp angles appear interchangeably; there is almost every imaginable distortion of slant, shape and size. Even human readers cannot always identify such characters; their error rate is about 3 per cent on randomly selected letters and numbers, seen out of context.
– from Oliver G. Selfridge & Ulric Neisser. PATTERN RECOGNITION BY MACHINE . In Computers & thought, Edward A. Feigenbaum and Julian Feldman (Eds.). MIT Press, Cambridge, MA, USA, 1963. pp. 8-30.

News Overviews Instructional Materials AI-Alerts Classics

Here Are the Top 10 Ted Talks on AI That Are a Must-Watch

#artificialintelligenceMay-26-2021, 12:03:39 GMT

In the current scenario, where everything is going digital, Ted Talks have a great role in educating and imparting knowledge to a wider audience. These engaging interactions have robbed the minds of people and Ted Talks do not consume a lot of time. Instead, they just spread ideas in a very concise, interactive form so that it hooks and does not bore the audience. Ted Talks cover a wide variety of themes and topics, technology is one of them. It has a great archive of talks on artificial intelligence.

intelligence, machine intelligence, ted talk, (15 more...)

#artificialintelligence

Country: Oceania > Australia (0.05)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Setting > Continuing Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.75)
Information Technology > Artificial Intelligence > Robots (0.73)
Information Technology > Artificial Intelligence > Cognitive Science (0.72)
(2 more...)

Add feedback

Text to Speech System for Multi-Speaker Setting

#artificialintelligenceMay-26-2021, 07:20:25 GMT

What would you want to do if you could generate the voice of your favorite celebrity? Before I get ahead of myself, let me clearly define the objective of this blog. Given text and some voice clips of the desired speaker (say, Beyonce), I want my AI to output an audio clip where Beyonce is speaking the text that I input to this code. So essentially, this is the same Text To Speech (TTS) problem we saw earlier but with an added constraint to output the speech in a particular speaker's voice. In this blog, I share two methods that can complete our task, and I will be comparing these two methods at the end.

beyonce, dataset, speaker encoder, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.65)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.65)
Information Technology > Artificial Intelligence > Assistive Technologies (0.65)

Add feedback

Purchase Order (PO) Matching - Automate with AI

#artificialintelligenceMay-24-2021, 22:35:08 GMT

PO Matching is the process of connecting a purchase order (PO) issued by a client indicating types, quantities, and agreed prices for products/services to the invoice issued by a vendor for it's delivery. The goal of PO matching is to ensure timely vendor payments, correct accounting of costs and easy detection of fraudulent practices. PO matching involves several steps, including the receipt of invoice, capture of data, verification with purchase order, matching the parameters, and resolution based on various parameters. Invoice processing and PO matching are complex, time-consuming, and resource-intensive processes when performed manually, especially in scaled-up business activities. Even in departments where there is digitization of information in the form of Enterprise Resource Planning (ERP) applications, a significant amount of human labour is required; from the time an invoice is raised or received to its entry into the ERP application, accounts payable personnel perform a seemingly endless list of chores.

information, invoice, po matching process, (13 more...)

#artificialintelligence

Country: Europe > United Kingdom (0.04)

Industry: Information Technology > Security & Privacy (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Enterprise Resource Planning (0.90)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.48)

Add feedback

How to detect online trends without web scraping

#artificialintelligenceMay-24-2021, 13:50:20 GMT

To get text information from the content of each screenshot, we will apply text recognition from these images. Our goal is not only to obtain the words used on the page but also their weights (understood as a measure of their relevance or importance). Thanks to that, we will be able to generate a word cloud, where word size will signal how exposed a word was on the site. Pytesseract is an optical character recognition (OCR) tool for python. It will recognize and "read" the text embedded in screenshots.

detect online trend, pytesseract, screenshot, (1 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.62)
Information Technology > Data Science > Data Mining > Web Mining (0.40)

Add feedback

Simple Transparent Adversarial Examples

Borkar, Jaydeep, Chen, Pin-Yu

arXiv.org Artificial IntelligenceMay-20-2021

There has been a rise in the use of Machine Learning as a Service (MLaaS) Vision APIs as they offer multiple services including pre-built models and algorithms, which otherwise take a huge amount of resources if built from scratch. As these APIs get deployed for high-stakes applications, it's very important that they are robust to different manipulations. Recent works have only focused on typical adversarial attacks when evaluating the robustness of vision APIs. We propose two new aspects of adversarial image generation methods and evaluate them on the robustness of Google Cloud Vision API's optical character recognition service and object detection APIs deployed in real-world settings such as sightengine.com, picpurify.com, Google Cloud Vision API, and Microsoft Azure's Computer Vision API. Specifically, we go beyond the conventional small-noise adversarial attacks and introduce secret embedding and transparent adversarial examples as a simpler way to evaluate robustness. These methods are so straightforward that even non-specialists can craft such attacks. As a result, they pose a serious threat where APIs are used for high-stakes applications. Our transparent adversarial examples successfully evade state-of-the art object detections APIs such as Azure Cloud Vision (attack success rate 52%) and Google Cloud Vision (attack success rate 36%). 90% of the images have a secret embedded text that successfully fools the vision of time-limited humans but is detected by Google Cloud Vision API's optical character recognition. Complementing to current research, our results provide simple but unconventional methods on robustness evaluation.

api, google cloud vision api, neural network, (13 more...)

arXiv.org Artificial Intelligence

2105.09685

Country:

Asia (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Maryland > Baltimore (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Text To Speech Explained from basic

#artificialintelligenceMay-14-2021, 20:45:07 GMT

As the title suggests, in this blog we are going to learn about text to speech (TTS) synthesis. What is the first bell which rings in your mind when you listen to text to speech? For me, it's Alexa, Google Home, Siri, and many other conversational bots that are on an exponential rise currently. Advances in deep learning research have helped us to generate human-like voices, so let's see how we can use that. I'll start with a few definitions, but if you want to understand these more then read this blog first.

decoder, mel-spectrogram, speech explained, (10 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.83)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.83)
Information Technology > Artificial Intelligence > Assistive Technologies (0.83)
(3 more...)

Add feedback

Microsoft is testing Xbox party chat accessibility features

EngadgetMay-13-2021, 10:26:44 GMT

Microsoft has announced that speech transcription and text-to-speech synthesis is coming to Xbox party chat, starting today for Xbox Insiders. The new features will make it easier for players with hearing or speech difficulties to participate in party chat and are part of an Xbox initiative to improve accessibility. Both features can be found in the "ease of access" tab under "game and chat transcription." With speech-to-text transcription, words spoken in a party are converted into text displayed in an adjustable overlay, as shown above. With text-to-speech enabled, anything you type into party text chat will be ready by a synthetic voice to the rest of the party, with a choice of several voices per language.

microsoft, transcription, xbox party chat accessibility feature, (1 more...)

Engadget

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.96)

Add feedback

Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech

Popov, Vadim, Vovk, Ivan, Gogoryan, Vladimir, Sadekova, Tasnima, Kudinov, Mikhail

arXiv.org Machine LearningMay-13-2021

Recently, denoising diffusion probabilistic models and generative score matching have shown high potential in modelling complex data distributions while stochastic calculus has provided a unified point of view on these techniques allowing for flexible inference schemes. In this paper we introduce Grad-TTS, a novel text-to-speech model with score-based decoder producing mel-spectrograms by gradually transforming noise predicted by encoder and aligned with text input by means of Monotonic Alignment Search. The framework of stochastic differential equations helps us to generalize conventional diffusion probabilistic models to the case of reconstructing data from noise with different parameters and allows to make this reconstruction flexible by explicitly controlling trade-off between sound quality and inference speed. Subjective human evaluation shows that Grad-TTS is competitive with state-of-the-art text-to-speech approaches in terms of Mean Opinion Score. We will make the code publicly available shortly.

diffusion probabilistic model, grad-tts, reverse diffusion, (13 more...)

arXiv.org Machine Learning

2105.06337

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
Asia > Russia (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.91)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.90)
(2 more...)

Add feedback

This $4 Mac app extracts text from images and videos for you

EngadgetMay-4-2021, 14:55:14 GMT

If you've ever gone through the painstaking process of transcribing text from a video, or begrudgingly typing up the copy from an image, you know the struggle. Not only is this a tedious activity, also it's prone to human error and a total time waster, to boot. Leave the manual work behind and join the thousands of Mac users who simplify their workflows with TextSniper, on sale now for just $4. TextSniper's optical character recognition (OCR) software works fast to detect any text from your screen, whether that's screenshots, images, videos, PDFs or digital documents. Instead of pouring over, say, a video, you'll be able to instantly convert that speech into text. Then, you're a simple copy-and-paste away from dropping the content into your notes, messaging app and anywhere else you please.

mac app extract text, textsniper, video, (2 more...)

Engadget

Industry: Information Technology (0.55)

Technology: Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.95)

Add feedback

Gartner says low-code, RPA, and AI driving growth in 'hyperautomation'

#artificialintelligenceApr-30-2021, 14:40:07 GMT

Research firm Gartner estimates the market for hyperautomation-enabling technologies will reach $596 billion in 2022, up nearly 24% from the $481.6 billion in 2020. Gartner is expecting significant growth for technology that enables organizations to rapidly identify, vet, and automate as many processes as possible and says it will become a "condition of survival" for enterprises. Hyperautomation-enabling technologies include robotic process automation (RPA), low-code application platforms (LCAP), AI, and virtual assistants. As organizations look for ways to automate the digitization and structuring of data and content, technologies that automate content ingestion, such as signature verification tools, optical character recognition, document ingestion, conversational AI, and natural language technology (NLT), will be in high demand. For example, these tools could be used to automate the process of digitizing and sorting paper records.

gartner, hyperautomation, process automation, (9 more...)

#artificialintelligence

AI-Alerts: 2021 > 2021-05 > AAAI AI-Alert for May 4, 2021 (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.73)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.57)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.37)

Add feedback