AITopics | Optical Character Recognition

Collaborating Authors

Optical Character Recognition

Our second example deals with a more challenging problem: the recognition of hand-printed letters of the alphabet. The characters that people print in the ordinary course of filling out forms and questionnaires are surprisingly varied. Gaps abound wherecontinuous lines might be expected; curves and sharp angles appear interchangeably; there is almost every imaginable distortion of slant, shape and size. Even human readers cannot always identify such characters; their error rate is about 3 per cent on randomly selected letters and numbers, seen out of context.
– from Oliver G. Selfridge & Ulric Neisser. PATTERN RECOGNITION BY MACHINE . In Computers & thought, Edward A. Feigenbaum and Julian Feldman (Eds.). MIT Press, Cambridge, MA, USA, 1963. pp. 8-30.

News Overviews Instructional Materials AI-Alerts Classics

This AI powered text-to-speech tool makes voiceovers sound true to life

PCWorldJul-10-2022, 14:43:43 GMT

The problem is that they often make videos sound robotic and lifeless, which is never good. Wish there was a better option? Then check out Speechnow, an AI-powered tool that makes video voiceovers sound true to life. Speechnow is a browser app that uses an AI algorithm to convert text into spoken word recordings. And it makes those recordings sound as if an actual human spoke them, so it's ideal for people who post a lot of videos to their socials.

life ai text, speech sn001, speechnow

PCWorld

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.50)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.50)
Information Technology > Artificial Intelligence > Assistive Technologies (0.50)

Add feedback

Towards Multimodal Vision-Language Models Generating Non-Generic Text

Robbins, Wes, Zohourianshahzadi, Zanyar, Kalita, Jugal

arXiv.org Artificial IntelligenceJul-8-2022

Vision-language models can assess visual context in an image and generate descriptive text. While the generated text may be accurate and syntactically correct, it is often overly general. To address this, recent work has used optical character recognition to supplement visual information with text extracted from an image. In this work, we contend that vision-language models can benefit from additional information that can be extracted from an image, but are not used by current models. We modify previous multimodal frameworks to accept relevant information from any number of auxiliary classifiers. In particular, we focus on person names as an additional set of tokens and create a novel image-caption dataset to facilitate captioning with person names. The dataset, Politicians and Athletes in Captions (PAC), consists of captioned images of well-known people in context. By fine-tuning pretrained models with this dataset, we demonstrate a model that can naturally integrate facial recognition tokens into generated text by training on limited data. For the PAC dataset, we provide a discussion on collection and baseline benchmark scores.

caption, classifier, dataset, (15 more...)

arXiv.org Artificial Intelligence

2207.04174

Country:

North America > United States > Montana (0.04)
North America > United States > Colorado > El Paso County > Colorado Springs (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
Asia > China (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.68)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.49)

Add feedback

Bhasha Daan : An crowdsourcing initiative for Indian languages

#artificialintelligenceJul-6-2022, 15:45:30 GMT

Bhasha Daan: An crowdsourcing initiative for Indian languages that will be as Indian, as you and I. We invite you to contribute data to develop Speech Recognition, Text-to-Speech, Machine Translation and Optical Character Recognition for Indian languages.

bhasha daan, indian language

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.87)
Information Technology > Communications > Social Media > Crowdsourcing (0.60)
Information Technology > Artificial Intelligence > Speech (0.53)

Add feedback

Optical Character Recognition Technology for Business Owners

#artificialintelligenceJun-30-2022, 02:20:43 GMT

Early versions of OCR had to be trained with images of each character and could only work with one font at a time. Modern machine learning algorithms make the text recognition process more advanced and provide a higher level of recognition accuracy for most fonts, regardless of input data formats. Advances in machine learning (ML) have given a new impetus to the development of OCR, significantly increasing the number of its applications. With enough training data, the OCR machine learning algorithm now can be applied to any real-world scenario that requires identification and text transformation. For example, receipts scanning, scanning of printed text with the further conversion of it into synthetic speech, traffic sign recognition, license plate recognition, etc.

character recognition, ocr, recognition, (12 more...)

#artificialintelligence

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > United Kingdom (0.04)

Industry:

Banking & Finance (0.69)
Health & Medicine > Health Care Providers & Services (0.48)
Information Technology > Security & Privacy (0.47)
Health & Medicine > Health Care Technology > Medical Record (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech

Cho, Hyunjae, Jung, Wonbin, Lee, Junhyeok, Woo, Sang Hoon

arXiv.org Artificial IntelligenceJun-24-2022

In this paper, we present SANE-TTS, a stable and natural end-to-end multilingual TTS model. By the difficulty of obtaining multilingual corpus for given speaker, training multilingual TTS model with monolingual corpora is unavoidable. We introduce speaker regularization loss that improves speech naturalness during cross-lingual synthesis as well as domain adversarial training, which is applied in other multilingual TTS models. Furthermore, by adding speaker regularization loss, replacing speaker embedding with zero vector in duration predictor stabilizes cross-lingual inference. With this replacement, our model generates speeches with moderate rhythm regardless of source speaker in cross-lingual synthesis. In MOS evaluation, SANE-TTS achieves naturalness score above 3.80 both in cross-lingual and intralingual synthesis, where the ground truth score is 3.99. Also, SANE-TTS maintains speaker similarity close to that of ground truth even in cross-lingual inference. Audio samples are available on our web page.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.21437/Interspeech.2022-46

2206.12132

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.56)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Optical Character Recognition using PaddleOCR

#artificialintelligenceJun-19-2022, 17:25:49 GMT

Reading huge documents can be very tiring and very time taking. You must have seen many software or applications where you just click a picture and get key information from the document. This is done by a technique called Optical Character Recognition (OCR). Optical Character Recognition is one of the key researches in the field of AI in recent years. Optical Character Recognition is the process of recognizing text from an image by understanding and analyzing its underlying patterns. This blog post will focus on implementing and comparing various OCR algorithms provided by PaddleOCR using just a few lines of code. Optical Character Recognition is the technique that recognizes and converts text into a machine-readable format by analyzing and understanding its underlying patterns. OCR can recognize handwritten text, printed text and texts "in the wild". In short, OCR enables computers to read. But how does OCR work? OCR makes use of Deep learning and computer vision techniques.

dataset, ocr, paddleocr, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Opportunities for Optical Character Recognition (OCR) in Insurance - Global IQX

#artificialintelligenceJun-17-2022, 11:17:01 GMT

A robust OCR process can convert client documents into structured data in a digestible format that can be analyzed for client cross-selling, up-selling, or new business opportunities. OCR programs can assist sales and underwriting teams by automatically extracting and transforming key details from RFPs and lengthy policy documents. OCR enables insurance sales professionals to streamline and drive efficiencies by automatically scrubbing RFP emails, multiple PDF documents, plan booklets, and even scanned images of policy documents for key details that can be transformed into a format appropriate for processing. This data can then be loaded into the insurance company's sales and underwriting systems, like a quoting and rating engine, creating an initial shell quote in seconds. Additionally, many insurance companies still maintain vast quantities of historical data in unstructured and paper formats.

global iqx, insurance, optical character recognition, (3 more...)

#artificialintelligence

Industry: Banking & Finance > Insurance (1.00)

Technology: Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.85)

Add feedback

less-known-facts-about-ai-voices-and-text-to-speech

#artificialintelligenceJun-8-2022, 03:05:34 GMT

Voice artificial intelligence is an emerging technology that uses voice commands to interact with humans. The technology is witnessing tremendous growth and intense research in modern engineering to explore untapped areas. We are well accustomed to hearing AI voices narrating monotone articles and reports. One of the most trending examples of their use by many people is Alexa and Siri-enabled devices. These devices are getting significant recognition, and the market for similar products is growing exceptionally.

ai voice, text-to-speech technology, tts technology, (1 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.54)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.54)
Information Technology > Artificial Intelligence > Assistive Technologies (0.54)

Add feedback

Helping Financial Services Tackle the Challenges of Unstructured Data

#artificialintelligenceJun-6-2022, 11:35:26 GMT

Today, large enterprises are grappling with an onslaught of unstructured data and documents. IDC and Seagate predict that the global data sphere will grow to 163 zettabytes by 2025, and about 80 percent of that will be unstructured. In regulated industries, such as financial services, the challenges posed by unstructured and semi-structured data are exponentially higher. Traditional methods–ranging from manual entry to Optical Character Recognition (OCR)–have proved woefully inadequate. Even more recent and highly heralded methods such as Robotic Process Automation (RPA) have proven to be piecemeal solutions to the challenge.

convus, financial service tackle, unstructured data, (3 more...)

#artificialintelligence

Industry: Banking & Finance > Financial Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.80)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.59)
Information Technology > Data Science > Data Mining > Big Data (0.40)

Add feedback

Delivering Document Conversion as a Cloud Service with High Throughput and Responsiveness

#artificialintelligenceJun-4-2022, 01:24:56 GMT

Document understanding is a key business process in the data-driven economy since documents are central to knowledge discovery and business insights. Converting documents into a machine-processable format is a particular challenge here due to their huge variability in formats and complex structure. Accordingly, many algorithms and machine-learning methods emerged to solve particular tasks such as Optical Character Recognition (OCR), layout analysis, table-structure recovery, figure understanding, etc. We observe the adoption of such methods in document understanding solutions offered by all major cloud providers. Yet, publications outlining how such services are designed and optimized to scale in the cloud are scarce.

cloud service, delivering document conversion, high throughput and responsiveness, (1 more...)

#artificialintelligence

Industry: Information Technology > Services (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.63)

Add feedback