AITopics | Optical Character Recognition

Collaborating Authors

Optical Character Recognition

Our second example deals with a more challenging problem: the recognition of hand-printed letters of the alphabet. The characters that people print in the ordinary course of filling out forms and questionnaires are surprisingly varied. Gaps abound wherecontinuous lines might be expected; curves and sharp angles appear interchangeably; there is almost every imaginable distortion of slant, shape and size. Even human readers cannot always identify such characters; their error rate is about 3 per cent on randomly selected letters and numbers, seen out of context.
– from Oliver G. Selfridge & Ulric Neisser. PATTERN RECOGNITION BY MACHINE . In Computers & thought, Edward A. Feigenbaum and Julian Feldman (Eds.). MIT Press, Cambridge, MA, USA, 1963. pp. 8-30.

News Overviews Instructional Materials AI-Alerts Classics

A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition

Soykan, Gürkan, Yuret, Deniz, Sezgin, Tevfik Metin

arXiv.org Artificial IntelligenceDec-27-2022

This study focuses on improving the optical character recognition (OCR) data for panels in the COMICS dataset, the largest dataset containing text and images from comic books. To do this, we developed a pipeline for OCR processing and labeling of comic books and created the first text detection and recognition datasets for western comics, called "COMICS Text+: Detection" and "COMICS Text+: Recognition". We evaluated the performance of state-of-the-art text detection and recognition models on these datasets and found significant improvement in word accuracy and normalized edit distance compared to the text in COMICS. We also created a new dataset called "COMICS Text+", which contains the extracted text from the textboxes in the COMICS dataset. Using the improved text data of COMICS Text+ in the comics processing model from resulted in state-of-the-art performance on cloze-style tasks without changing the model architecture. The COMICS Text+ dataset can be a valuable resource for researchers working on tasks including text detection, recognition, and high-level processing of comics, such as narrative understanding, character relations, and story generation. All the data and inference instructions can be accessed in https://github.com/gsoykan/comics_text_plus.

machine learning, natural language, pattern recognition, (18 more...)

arXiv.org Artificial Intelligence

2212.14674

Country: North America > United States (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(2 more...)

Add feedback

FedEx, UPS warn mail delivery could be interrupted by winter storm as driver safety takes priority

FOX NewsDec-24-2022, 09:07:38 GMT

Fox News correspondent Mike Tobin reports that severe weather disrupts travel plans ahead of the holidays on'Special Report.' FedEx and UPS announced mail delivery could be interrupted by the massive winter storm moving across the U.S. after key distribution hubs were blasted by the severe weather conditions. On Friday, FedEx posted a statement to its website warning those who used its Express service that the guaranteed delivery date of Dec. 26 may not be met after the Memphis and Indianapolis hubs experienced "substantial" weather disruptions. The shipping company said actions have been taken to lessen any impact on delivery, but the safety of its team members is the "number one priority." "We recognize the importance of deliveries this holiday weekend and are committed to providing service to the best of our ability by implementing contingency measures where it is safe and possible to do so," the statement read.

delivery, mail delivery, winter storm, (11 more...)

FOX News

Country:

North America > United States > Indiana > Marion County > Indianapolis (0.26)
North America > United States > Oregon > Lane County > Eugene (0.06)
North America > United States > Kentucky > Jefferson County > Louisville (0.06)
(3 more...)

Industry:

Transportation > Freight & Logistics Services (1.00)
Transportation > Ground > Road (0.52)

Technology: Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.62)

Add feedback

Verbyl – Text-to-Speech Converter

#artificialintelligenceDec-24-2022, 05:15:21 GMT

Since the dawns of humanity people would gather around the fire and listen to stories… Only in the last 100 years, we are used to watching stories at the cinema, TV and later on YouTube. VIDEOS without a good VOICEOVER will not convert, will not get you clicks, leads, traffic, or any sales! That's why a VIDEO is not efficient Without A GOOD VOICEOVER That Tells The Actual Story!

text-to-speech converter, video, voiceover

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.40)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.40)
Information Technology > Artificial Intelligence > Assistive Technologies (0.40)

Add feedback

Bengali Handwritten Digit Recognition using CNN with Explainable AI

Shawon, Md Tanvir Rouf, Tanvir, Raihan, Alam, Md. Golam Rabiul

arXiv.org Artificial IntelligenceDec-22-2022

Handwritten character recognition is a hot topic for research nowadays. If we can convert a handwritten piece of paper into a text-searchable document using the Optical Character Recognition (OCR) technique, we can easily understand the content and do not need to read the handwritten document. OCR in the English language is very common, but in the Bengali language, it is very hard to find a good quality OCR application. If we can merge machine learning and deep learning with OCR, it could be a huge contribution to this field. Various researchers have proposed a number of strategies for recognizing Bengali handwritten characters. A lot of ML algorithms and deep neural networks were used in their work, but the explanations of their models are not available. In our work, we have used various machine learning algorithms and CNN to recognize handwritten Bengali digits. We have got acceptable accuracy from some ML models, and CNN has given us great testing accuracy. Grad-CAM was used as an XAI method on our CNN model, which gave us insights into the model and helped us detect the origin of interest for recognizing a digit from an image.

artificial intelligence, machine learning, pattern recognition, (19 more...)

arXiv.org Artificial Intelligence

2212.12146

Country:

Asia > Singapore (0.04)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

[2212.08610v1] Huruf: An Application for Arabic Handwritten Character Recognition Using Deep Learning

#artificialintelligenceDec-19-2022, 01:15:27 GMT

Handwriting Recognition has been a field of great interest in the Artificial Intelligence domain. Due to its broad use cases in real life, research has been conducted widely on it. Prominent work has been done in this field focusing mainly on Latin characters. However, the domain of Arabic handwritten character recognition is still relatively unexplored. The inherent cursive nature of the Arabic characters and variations in writing styles across individuals makes the task even more challenging. We identified some probable reasons behind this and proposed a lightweight Convolutional Neural Network-based architecture for recognizing Arabic characters and digits. The proposed pipeline consists of a total of 18 layers containing four layers each for convolution, pooling, batch normalization, dropout, and finally one Global average pooling and a Dense layer. Furthermore, we thoroughly investigated the different choices of hyperparameters such as the choice of the optimizer, kernel initializer, activation function, etc. Evaluating the proposed architecture on the publicly available 'Arabic Handwritten Character Dataset (AHCD)' and 'Modified Arabic handwritten digits Database (MadBase)' datasets, the proposed model respectively achieved an accuracy of 96.93% and 99.35% which is comparable to the state-of-the-art and makes it a suitable solution for real-life end-level applications.

application, arabic handwritten character recognition, deep learning, (2 more...)

#artificialintelligence

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

A Beginner's Guide to Language Models

#artificialintelligenceDec-8-2022, 16:05:19 GMT

A language model uses machine learning to conduct a probability distribution over words used to predict the most likely next word in a sentence based on the previous entry. Language models learn from text and can be used for producing original text, predicting the next word in a text, speech recognition, optical character recognition and handwriting recognition.

beginner, language model, recognition

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.86)
Information Technology > Artificial Intelligence > Machine Learning (0.86)

Add feedback

AI + OCR - A Key Ingredient To Digital

#artificialintelligenceDec-7-2022, 14:05:23 GMT

Countless human hours are required to manually extract the data into a machine-readable format. This process is known as ETL (extract, transform, and load). Insurers that can maximize their ETL capabilities have a powerful competitive advantage. Optical character recognition, also known as text recognition, converts text from scanned paper documents, photos, books, and PDF files into a machine-readable format, isn't new. What is new is coupling OCR with AI and machine-learning algorithms to reliably generate text that can be processed, indexed, and retrieved.

application, insurer, ocr application, (9 more...)

#artificialintelligence

Industry: Banking & Finance > Insurance (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.78)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.57)

Add feedback

The Digital Insider

#artificialintelligenceDec-7-2022, 10:40:14 GMT

While there are all kinds of tips and tools to help you multitask, sometimes the best solutions are hiding in plain sight. A text-to-speech converter is one of those simple things that can help you listen to documents you have to read while working on something else, or add quality narration to videos and seminars to save you time from recording voices yourself. There are myriad applications, and Notevibes is one of the best solutions on the market.

best solution, digital insider

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.72)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.72)
Information Technology > Artificial Intelligence > Assistive Technologies (0.72)

Add feedback

Making your own document scanner in 40 lines of code

#artificialintelligenceNov-25-2022, 15:30:08 GMT

One of the benefits of being proficient with Machine Learning is having a good understanding of the algorithms that run some of the wonderful features we see on our devices. When Apple, the computer device manufacturing company, released the iOS16 version, one of the new functionalities was the ability to use the default Notes app as a digital scanner, think of it as a "scanner in your palm", borrowing a similar phrase from the legendary Steve Jobs. Prior to when it was introduced, I had to use other services usually apps downloaded from the App Store for the purpose of scanning documents with my phones, some paid some free and some of the free apps come with the disadvantage of a watermark which somewhat defeats the purpose unless you subscribe to a paid version. Having worked on a number of computer vision projects, I thought, would it be possible there is some computer vision library or ML algorithm one can use to replicate what's been done in my phone? In this article, we will be using a very popular library familiar to most MLEs familiar with deep learning particular computer vision: OpenCV.

document scanner, library, scanner, (10 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.73)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.41)

Add feedback

PromptTTS: Controllable Text-to-Speech with Text Descriptions

Guo, Zhifang, Leng, Yichong, Wu, Yihan, Zhao, Sheng, Tan, Xu

arXiv.org Artificial IntelligenceNov-22-2022

Using a text description as prompt to guide the generation of text or images (e.g., GPT-3 or DALLE-2) has drawn wide attention recently. Beyond text and image generation, in this work, we explore the possibility of utilizing text descriptions to guide speech synthesis. Thus, we develop a text-to-speech (TTS) system (dubbed as PromptTTS) that takes a prompt with both style and content descriptions as input to synthesize the corresponding speech. Specifically, PromptTTS consists of a style encoder and a content encoder to extract the corresponding representations from the prompt, and a speech decoder to synthesize speech according to the extracted style and content representations. Compared with previous works in controllable TTS that require users to have acoustic knowledge to understand style factors such as prosody and pitch, PromptTTS is more user-friendly since text descriptions are a more natural way to express speech style (e.g., ''A lady whispers to her friend slowly''). Given that there is no TTS dataset with prompts, to benchmark the task of PromptTTS, we construct and release a dataset containing prompts with style and content information and the corresponding speech. Experiments show that PromptTTS can generate speech with precise style control and high speech quality. Audio samples and our dataset are publicly available.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2211.12171

Country: Asia > China (0.05)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.62)

Add feedback