AITopics | Optical Character Recognition

Collaborating Authors

Optical Character Recognition

Our second example deals with a more challenging problem: the recognition of hand-printed letters of the alphabet. The characters that people print in the ordinary course of filling out forms and questionnaires are surprisingly varied. Gaps abound wherecontinuous lines might be expected; curves and sharp angles appear interchangeably; there is almost every imaginable distortion of slant, shape and size. Even human readers cannot always identify such characters; their error rate is about 3 per cent on randomly selected letters and numbers, seen out of context.
– from Oliver G. Selfridge & Ulric Neisser. PATTERN RECOGNITION BY MACHINE . In Computers & thought, Edward A. Feigenbaum and Julian Feldman (Eds.). MIT Press, Cambridge, MA, USA, 1963. pp. 8-30.

News Overviews Instructional Materials AI-Alerts Classics

Innovation Award Honorees - CES 2022

#artificialintelligenceJan-4-2022, 23:55:23 GMT

OrCam MyEye PRO is a wearable assistive technology device for people who are blind, visually impaired or have reading challenges. It's lightweight, finger-size and magnetically mounts on eyeglass frames. The device instantly reads aloud any printed text (books, menus, signs) and digital screens (computer, smartphone), recognizes faces, and identifies products/bar codes, money notes and colors – all in real time and offline. The interactive Smart Reading feature enables users to tailor their assistive reading experience, and Orientation assists with guidance and identification of objects. Newly released "Hey OrCam" enables control of all device features and settings hands-free, using voice commands.

innovation award honoree

#artificialintelligence

Genre: Personal > Honors > Award (0.40)

Technology:

Information Technology > Artificial Intelligence > Assistive Technologies (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.69)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.69)

Add feedback

Guided-TTS:Text-to-Speech with Untranscribed Speech

Kim, Heeseung, Kim, Sungwon, Yoon, Sungroh

arXiv.org Artificial IntelligenceDec-7-2021

Most neural text-to-speech (TTS) models require paired data from the desired speaker for high-quality speech synthesis, which limits the usage of large amounts of untranscribed data for training. In this work, we present Guided-TTS, a high-quality TTS model that learns to generate speech from untranscribed speech data. Guided-TTS combines an unconditional diffusion probabilistic model with a separately trained phoneme classifier for text-to-speech. By modeling the unconditional distribution for speech, our model can utilize the untranscribed data for training. For text-to-speech synthesis, we guide the generative process of the unconditional DDPM via phoneme classification to produce mel-spectrograms from the conditional distribution given transcript. We show that Guided-TTS achieves comparable performance with the existing methods without any transcript for LJSpeech. Our results further show that a single speaker-dependent phoneme classifier trained on multispeaker large-scale data can guide unconditional DDPMs for various speakers to perform TTS.

classifier, guided-tts, phoneme classifier, (16 more...)

arXiv.org Artificial Intelligence

2111.11755

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

An Automatic Approach for Generating Rich, Linked Geo-Metadata from Historical Map Images

Li, Zekun, Chiang, Yao-Yi, Tavakkol, Sasan, Shbita, Basel, Uhl, Johannes H., Leyk, Stefan, Knoblock, Craig A.

arXiv.org Artificial IntelligenceDec-2-2021

Historical maps contain detailed geographic information difficult to find elsewhere covering long-periods of time (e.g., 125 years for the historical topographic maps in the US). However, these maps typically exist as scanned images without searchable metadata. Existing approaches making historical maps searchable rely on tedious manual work (including crowd-sourcing) to generate the metadata (e.g., geolocations and keywords). Optical character recognition (OCR) software could alleviate the required manual work, but the recognition results are individual words instead of location phrases (e.g., "Black" and "Mountain" vs. "Black Mountain"). This paper presents an end-to-end approach to address the real-world problem of finding and indexing historical map images. This approach automatically processes historical map images to extract their text content and generates a set of metadata that is linked to large external geospatial knowledge bases. The linked metadata in the RDF (Resource Description Framework) format support complex queries for finding and indexing historical maps, such as retrieving all historical maps covering mountain peaks higher than 1,000 meters in California. We have implemented the approach in a system called mapKurator. We have evaluated mapKurator using historical maps from several sources with various map styles, scales, and coverage. Our results show significant improvement over the state-of-the-art methods. The code has been made publicly available as modules of the Kartta Labs project at https://github.com/kartta-labs/Project.

information, location phrase, text region, (15 more...)

arXiv.org Artificial Intelligence

2112.01671

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Colorado > Boulder County > Boulder (0.14)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Here's how AI can transform the lives of disabled

#artificialintelligenceDec-1-2021, 07:25:09 GMT

Many believe that artificial intelligence is a futuristic concept that we only see in sci-fi movies with humanoid robots and holograms. However, it is becoming rooted in our reality, affecting various fields and groups, including persons with disabilities. Accessibility and inclusivity are genuinely revolutionized, thanks to artificial intelligence! People with disabilities can substantially enhance their daily life thanks to AI technology solutions. We've already shown how smartphones can be tools for people with vision impairments.

advancement, artificial intelligence, disability, (9 more...)

#artificialintelligence

Industry:

Media (0.36)
Health & Medicine (0.31)

Technology:

Information Technology > Artificial Intelligence > Robots (0.56)
Information Technology > Communications > Mobile (0.52)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.50)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.32)

Add feedback

Book Metadata and Cover Retrieval Using OCR and Google Books API - KDnuggets

#artificialintelligenceNov-28-2021, 15:35:17 GMT

Most of the time, the raw data that we need for our data science project is not organized in a neat, well-structured, and insightful table. Rather, this is sometimes stored as text in a scanned document. Words in the document must then be extracted one by one to form a text formatted data cell. This is the task performed by Optical Character Recognition (OCR). As you read the words of this article, be it text or number, your eyes are able to process them by recognizing light and dark patterns that make up characters (e.g., letters, number, punctuation marks, etc.).

configuration, isbn code, node, (10 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.71)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.69)

Add feedback

Computer Vision: Python OCR & Object Detection Quick Starter

#artificialintelligenceNov-27-2021, 21:32:57 GMT

This is the third course from my Computer Vision series. Image Recognition, Object Detection, Object Recognition and also Optical Character Recognition are among the most used applications of Computer Vision. Using these techniques, the computer will be able to recognize and classify either the whole image, or multiple objects inside a single image predicting the class of the objects with the percentage accuracy score. Using OCR, it can also recognize and convert text in the images to machine readable format like text or a document. Object Detection and Object Recognition is widely used in many simple applications and also complex ones like self driving cars.

image recognition, optical character recognition, recognition, (11 more...)

#artificialintelligence

Genre: Instructional Material (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.68)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.67)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.55)

Add feedback

Guided-TTS: Text-to-Speech with Untranscribed Speech - Technology Org

#artificialintelligenceNov-27-2021, 10:05:57 GMT

Neural text-to-speech (TTS) models are successfully used to generate high-quality human-like speech. However, most TTS models can be trained if only the transcribed data of the desired speaker is given. That means that long-form untranscribed data, such as podcasts, cannot be used to train existing models. A recent paper on arXiv proposes an unconditional diffusion-based generative model. It is trained on untranscribed data that leverages a phoneme classifier for text-to-speech synthesis.

artificial intelligence, optical character recognition, untranscribed speech, (11 more...)

#artificialintelligence

Genre: Research Report (0.36)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.98)
Information Technology > Artificial Intelligence > Assistive Technologies (0.98)

Add feedback

Disney adds beloved characters as text-to-speech voices in TikTok – and bans them from saying 'lesbian' or 'gay'

The Independent - TechNov-16-2021, 11:34:42 GMT

A text-to-speech TikTok voice made by Disney that made users sound like Rocket Raccoon does not allow users to'say' words like "gay", "lesbian", or "queer". Numerous posts by users showed the feature failing to say the LGBTQ terms before it was quietly changed to allow the words. Words like "bisexual" and "transgender", were allowed by the feature. Originally, Rocket's voice would skip over the words when written normally but would be pronounced phonetically if a user wrote "qweer", for example. Attempts to make it read text that contained only the seemingly-prohibited words resulted in an error message saying that text-to-speech was not supported by the language chosen.

lesbian, text-to-speech voice, tiktok, (10 more...)

The Independent - Tech

Country:

North America > United States > New York (0.08)
Europe > Middle East (0.06)
Asia > Middle East > Jordan (0.06)
Africa > Middle East (0.06)

Industry: Information Technology > Services (0.37)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.86)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.86)

Add feedback

Computer Vision: Python OCR & Object Detection Quick Starter

#artificialintelligenceNov-13-2021, 11:53:07 GMT

image recognition, optical character recognition, recognition, (11 more...)

#artificialintelligence

Genre: Instructional Material (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.68)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.67)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.55)

Add feedback

Instagram introduces text-to-speech and voice effects for Reels

EngadgetNov-12-2021, 07:15:47 GMT

Instagram was clearly trying to court TikTok users when it launched its short-form video format called Reels. Now, it has introduced two features already widely popular on TikTok, perhaps in hopes that they can convert those who've been hesitating to use Reels due to their absence. One of those tools is text-to-speech, which provides a robotic voiceover for videos. When a user types in text for their videos, they'll now be able to get an auto-generated voice to read it out loud by accessing the feature living inside the Text bubble on the lower left corner of the screen. They then have to choose between the two available voice options before posting their video. While text-to-speech will make Reels more accessible, it's also popular on TikTok just because some find a robotic voice narrating their activities a funny addition to their content.

instagram introduce, introduce text-to-speech and voice effect, reel, (1 more...)

Engadget

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.93)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.93)

Add feedback