AITopics | Optical Character Recognition

Collaborating Authors

Optical Character Recognition

Our second example deals with a more challenging problem: the recognition of hand-printed letters of the alphabet. The characters that people print in the ordinary course of filling out forms and questionnaires are surprisingly varied. Gaps abound wherecontinuous lines might be expected; curves and sharp angles appear interchangeably; there is almost every imaginable distortion of slant, shape and size. Even human readers cannot always identify such characters; their error rate is about 3 per cent on randomly selected letters and numbers, seen out of context.
– from Oliver G. Selfridge & Ulric Neisser. PATTERN RECOGNITION BY MACHINE . In Computers & thought, Edward A. Feigenbaum and Julian Feldman (Eds.). MIT Press, Cambridge, MA, USA, 1963. pp. 8-30.

News Overviews Instructional Materials AI-Alerts Classics

Human Reading and the Curse of Dimensionality

Neural Information Processing SystemsApr-6-2023, 18:31:58 GMT

Whereas optical character recognition (OCR) systems learn to clas(cid:173) sify single characters; people learn to classify long character strings in parallel, within a single fixation . This difference is surprising because high dimensionality is associated with poor classification learning. This paper suggests that the human reading system avoids these problems because the number of to-be-classified im(cid:173) ages is reduced by consistent and optimal eye fixation positions, and by character sequence regularities. An interesting difference exists between human reading and optical character recog(cid:173) nition (OCR) systems. The input/output dimensionality of character classification in human reading is much greater than that for OCR systems (see Figure 1) .

dimensionality, fixation, ocr, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Iterative Learning for Reliable Crowdsourcing Systems

Neural Information Processing SystemsApr-6-2023, 13:06:54 GMT

Crowdsourcing systems, in which tasks are electronically distributed to numerous information piece-workers'', have emerged as an effective paradigm for human-powered solving of large scale problems in domains such as image classification, data entry, optical character recognition, recommendation, and proofreading. Because these low-paid workers can be unreliable, nearly all crowdsourcers must devise schemes to increase confidence in their answers, typically by assigning each task multiple times and combining the answers in some way such as majority voting. In this paper, we consider a general model of such rowdsourcing tasks, and pose the problem of minimizing the total price (i.e., number of task assignments) that must be paid to achieve a target overall reliability. We give new algorithms for deciding which tasks to assign to which workers and for inferring correct answers from the workers' answers. We show that our algorithm significantly outperforms majority voting and, in fact, are asymptotically optimal through comparison to an oracle that knows the reliability of every worker.

iterative learning, majority voting, reliable crowdsourcing system, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (0.65)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.64)

Add feedback

Cleansing Jewel: A Neural Spelling Correction Model Built On Google OCR-ed Tibetan Manuscripts

Luo, Queenie, Chuang, Yung-Sung

arXiv.org Artificial IntelligenceApr-6-2023

Scholars in the humanities rely heavily on ancient manuscripts to study history, religion, and socio-political structures in the past. Many efforts have been devoted to digitizing these precious manuscripts using OCR technology, but most manuscripts were blemished over the centuries so that an Optical Character Recognition (OCR) program cannot be expected to capture faded graphs and stains on pages. This work presents a neural spelling correction model built on Google OCR-ed Tibetan Manuscripts to auto-correct OCR-ed noisy output. This paper is divided into four sections: dataset, model architecture, training and analysis. First, we feature-engineered our raw Tibetan etext corpus into two sets of structured data frames -- a set of paired toy data and a set of paired real data. Then, we implemented a Confidence Score mechanism into the Transformer architecture to perform spelling correction tasks. According to the Loss and Character Error Rate, our Transformer + Confidence score mechanism architecture proves to be superior to Transformer, LSTM-2-LSTM and GRU-2-GRU architectures. Finally, to examine the robustness of our model, we analyzed erroneous tokens, visualized Attention and Self-Attention heatmaps in our model.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2304.03427

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.86)

Add feedback

Optical Character Recognition (OCR) MasterClass in Python

#artificialintelligenceApr-3-2023, 02:56:20 GMT

My name is Raj Chhabria and I am a Computer Science Engineer with specialization in Data Science. I am an accomplished coder and programmer, and I enjoy using my skills to contribute to student community by my Udemy Courses. Here on Udemy I intend to share my knowledge in most condensed form through my courses.

masterclass, optical character recognition, python, (1 more...)

#artificialintelligence

Genre:

Instructional Material > Online (0.87)
Instructional Material > Course Syllabus & Notes (0.87)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.85)

Add feedback

Unsupervised Pre-Training For Data-Efficient Text-to-Speech On Low Resource Languages

Park, Seongyeon, Song, Myungseo, Kim, Bohyung, Oh, Tae-Hyun

arXiv.org Artificial IntelligenceMar-27-2023

Neural text-to-speech (TTS) models can synthesize natural human speech when trained on large amounts of transcribed speech. However, collecting such large-scale transcribed data is expensive. This paper proposes an unsupervised pre-training method for a sequence-to-sequence TTS model by leveraging large untranscribed speech data. With our pre-training, we can remarkably reduce the amount of paired transcribed data required to train the model for the target downstream TTS task. The main idea is to pre-train the model to reconstruct de-warped mel-spectrograms from warped ones, which may allow the model to learn proper temporal assignment relation between input and output sequences. In addition, we propose a data augmentation method that further improves the data efficiency in fine-tuning. We empirically demonstrate the effectiveness of our proposed method in low-resource language scenarios, achieving outstanding performance compared to competing methods. The code and audio samples are available at: https://github.com/cnaigithub/SpeechDewarping

artificial intelligence, machine learning, optical character recognition, (18 more...)

arXiv.org Artificial Intelligence

2303.15669

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.72)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.62)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.49)
(2 more...)

Add feedback

Optical Character Recognition and Transcription of Berber Signs from Images in a Low-Resource Language Amazigh

Corallo, Levi, Varde, Aparna S.

arXiv.org Artificial IntelligenceMar-21-2023

The Berber, or Amazigh language family is a low-resource North African vernacular language spoken by the indigenous Berber ethnic group. It has its own unique alphabet called Tifinagh used across Berber communities in Morocco, Algeria, and others. The Afroasiatic language Berber is spoken by 14 million people, yet lacks adequate representation in education, research, web applications etc. For instance, there is no option of translation to or from Amazigh / Berber on Google Translate, which hosts over 100 languages today. Consequently, we do not find specialized educational apps, L2 (2nd language learner) acquisition, automated language translation, and remote-access facilities enabled in Berber. Motivated by this background, we propose a supervised approach called DaToBS for Detection and Transcription of Berber Signs. The DaToBS approach entails the automatic recognition and transcription of Tifinagh characters from signs in photographs of natural environments. This is achieved by self-creating a corpus of 1862 pre-processed character images; curating the corpus with human-guided annotation; and feeding it into an OCR model via the deployment of CNN for deep learning based on computer vision models. We deploy computer vision modeling (rather than language models) because there are pictorial symbols in this alphabet, this deployment being a novel aspect of our work. The DaToBS experimentation and analyses yield over 92 percent accuracy in our research. To the best of our knowledge, ours is among the first few works in the automated transcription of Berber signs from roadside images with deep learning, yielding high accuracy. This can pave the way for developing pedagogical applications in the Berber language, thereby addressing an important goal of outreach to underrepresented communities via AI in education.

corpus, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2303.13549

Country:

Africa > Middle East > Algeria (0.25)
Europe > United Kingdom (0.24)
North America > United States > Illinois (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology (0.47)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DRISHTI: Visual Navigation Assistant for Visually Impaired

Joshi, Malay, Shukla, Aditi, Srivastava, Jayesh, Rastogi, Manya

arXiv.org Artificial IntelligenceMar-13-2023

In today's society, where independent living is becoming increasingly important, it can be extremely constricting for those who are blind. Blind and visually impaired (BVI) people face challenges because they need manual support to prompt information about their environment. In this work, we took our first step towards developing an affordable and high-performing eye wearable assistive device, DRISHTI, to provide visual navigation assistance for BVI people. This system comprises a camera module, ESP32 processor, Bluetooth module, smartphone and speakers. Using artificial intelligence, this system is proposed to detect and understand the nature of the users' path and obstacles ahead of the user in that path and then inform BVI users about it via audio output to enable them to acquire directions by themselves on their journey. This first step discussed in this paper involves establishing a proof-of-concept of achieving the right balance of affordability and performance by testing an initial software integration of a currency detection algorithm on a low-cost embedded arrangement. This work will lay the foundation for our upcoming works toward achieving the goal of assisting the maximum of BVI people around the globe in moving independently.

artificial intelligence, machine learning, optical character recognition, (15 more...)

arXiv.org Artificial Intelligence

2303.07451

Country:

Asia > India > Uttar Pradesh (0.04)
Asia > India > Tamil Nadu > Chennai (0.04)
Asia > China > Liaoning Province > Dalian (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Mobile (0.73)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.30)

Add feedback

DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech

Lee, Keon, Park, Kyumin, Kim, Daeyoung

arXiv.org Artificial IntelligenceMar-12-2023

The majority of current Text-to-Speech (TTS) datasets, which are collections of individual utterances, contain few conversational aspects. In this paper, we introduce DailyTalk, a high-quality conversational speech dataset designed for conversational TTS. We sampled, modified, and recorded 2,541 dialogues from the open-domain dialogue dataset DailyDialog inheriting its annotated attributes. On top of our dataset, we extend prior work as our baseline, where a non-autoregressive TTS is conditioned on historical information in a dialogue. From the baseline experiment with both general and our novel metrics, we show that DailyTalk can be used as a general TTS dataset, and more than that, our baseline can represent contextual information from DailyTalk. The DailyTalk dataset and baseline code are freely available for academic use with CC-BY-SA 4.0 license.

artificial intelligence, natural language, optical character recognition, (20 more...)

arXiv.org Artificial Intelligence

2207.01063

Country:

North America > Canada > Quebec > Montreal (0.05)
North America > United States (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.62)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.62)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.50)

Add feedback

Fine-grained Emotional Control of Text-To-Speech: Learning To Rank Inter- And Intra-Class Emotion Intensities

Wang, Shijun, Guðnason, Jón, Borth, Damian

arXiv.org Artificial IntelligenceMar-11-2023

Nevertheless, the nuance of references might be difficult to be captured by these models State-of-the-art Text-To-Speech (TTS) models are capable (e.g. one sad and one depressed reference might produce of producing high-quality speech. The generated speech, the same synthesized speech), due to a mismatch between the however, is usually neutral in emotional expression, whereas content or speaker of the reference and synthesized speech, very often one would want fine-grained emotional control which implies the inflexible controllability of these models. of words or phonemes. Although still challenging, the first A better approach to achieve fine-grained controllable TTS models have been recently proposed that are able to emotional TTS is by manually assigning intensity labels (such control voice by manually assigning emotion intensity. Unfortunately, as strong or weak happiness) on words or phonemes, which due to the neglect of intra-class distance, the provides a flexible and efficient way to control the emotion intensity differences are often unrecognizable.

artificial intelligence, intensity representation, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2303.01508

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Switzerland > St. Gallen > St. Gallen (0.04)
Europe > Iceland > Capital Region > Reykjavik (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.86)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.61)

Add feedback

AI Voice Generator: Versatile Text to Speech Software

#artificialintelligenceMar-4-2023, 09:25:30 GMT

For years, creating good voice overs meant investing hundreds if not thousands of dollars in hiring voice artists, renting a recording studio to get the script recorded, investing in expensive recording equipment (if you are recording from home), and recruiting or outsourcing the entire project to an audio editor to mix the audio and produce a high-quality voiceover. Not to mention, the valuable hours dedicated to the entire process. Even after all this, the quality of the produced audio file may be subpar. What if there was an alternative to creating studio-quality voiceovers, and that too from the comfort of your own homes? Introducing Murf AI voice generator, which eliminates the entire process of generating voiceovers manually and enables you to quickly produce human-like voiceovers without any specialized hardware or professional. Leveraging advanced AI algorithms and deep learning, the realistic online voice generator tool allows you to convert text into natural-sounding speech, in a matter of just a few minutes.

artificial intelligence, machine learning, murf, (13 more...)

#artificialintelligence

Industry: Education > Educational Setting (0.33)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.43)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.43)
Information Technology > Artificial Intelligence > Assistive Technologies (0.43)

Add feedback