AITopics | Optical Character Recognition

Collaborating Authors

Optical Character Recognition

Our second example deals with a more challenging problem: the recognition of hand-printed letters of the alphabet. The characters that people print in the ordinary course of filling out forms and questionnaires are surprisingly varied. Gaps abound wherecontinuous lines might be expected; curves and sharp angles appear interchangeably; there is almost every imaginable distortion of slant, shape and size. Even human readers cannot always identify such characters; their error rate is about 3 per cent on randomly selected letters and numbers, seen out of context.
– from Oliver G. Selfridge & Ulric Neisser. PATTERN RECOGNITION BY MACHINE . In Computers & thought, Edward A. Feigenbaum and Julian Feldman (Eds.). MIT Press, Cambridge, MA, USA, 1963. pp. 8-30.

News Overviews Instructional Materials AI-Alerts Classics

Digital Einstein Experience: Fast Text-to-Speech for Conversational AI

Rownicka, Joanna, Sprenkamp, Kilian, Tripiana, Antonio, Gromoglasov, Volodymyr, Kunz, Timo P

arXiv.org Artificial IntelligenceJul-21-2021

We describe our approach to create and deliver a custom voice for a conversational AI use-case. More specifically, we provide a voice for a Digital Einstein character, to enable human-computer interaction within the digital conversation experience. To create the voice which fits the context well, we first design a voice character and we produce the recordings which correspond to the desired speech attributes. We then model the voice. Our solution utilizes Fastspeech 2 for log-scaled mel-spectrogram prediction from phonemes and Parallel WaveGAN to generate the waveforms. The system supports a character input and gives a speech waveform at the output. We use a custom dictionary for selected words to ensure their proper pronunciation. Our proposed cloud architecture enables for fast voice delivery, making it possible to talk to the digital version of Albert Einstein in real-time.

digital einstein experience, text-to-speech, tts service, (13 more...)

arXiv.org Artificial Intelligence

2107.10658

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.54)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.44)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.36)

Add feedback

Robust Learning for Text Classification with Multi-source Noise Simulation and Hard Example Mining

Xu, Guowei, Ding, Wenbiao, Fu, Weiping, Wu, Zhongqin, Liu, Zitao

arXiv.org Artificial IntelligenceJul-15-2021

Many real-world applications involve the use of Optical Character Recognition (OCR) engines to transform handwritten images into transcripts on which downstream Natural Language Processing (NLP) models are applied. In this process, OCR engines may introduce errors and inputs to downstream NLP models become noisy. Despite that pre-trained models achieve state-of-the-art performance in many NLP benchmarks, we prove that they are not robust to noisy texts generated by real OCR engines. This greatly limits the application of NLP models in real-world scenarios. In order to improve model performance on noisy OCR transcripts, it is natural to train the NLP model on labelled noisy texts. However, in most cases there are only labelled clean texts. Since there is no handwritten pictures corresponding to the text, it is impossible to directly use the recognition model to obtain noisy labelled data. Human resources can be employed to copy texts and take pictures, but it is extremely expensive considering the size of data for model training. Consequently, we are interested in making NLP models intrinsically robust to OCR errors in a low resource manner. We propose a novel robust training framework which 1) employs simple but effective methods to directly simulate natural OCR noises from clean texts and 2) iteratively mines the hard examples from a large number of simulated samples for optimal performance. 3) To make our model learn noise-invariant representations, a stability loss is employed. Experiments on three real-world datasets show that the proposed framework boosts the robustness of pre-trained models by a large margin. We believe that this work can greatly promote the application of NLP models in actual scenarios, although the algorithm we use is simple and straightforward. We make our codes and three datasets publicly available\footnote{https://github.com/tal-ai/Robust-learning-MSSHEM}.

ocr transcript, stability loss, transcript, (13 more...)

arXiv.org Artificial Intelligence

2107.07113

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.48)
(3 more...)

Add feedback

Anyline nabs $20M to automate mobile data capture for enterprises

#artificialintelligenceJul-7-2021, 12:15:15 GMT

Where does your enterprise stand on the AI adoption curve? Take our AI survey to find out. Anyline, a company that builds mobile data capture and scanning technologies for multiple industries, has raised $20 million. Founded out of Vienna, Austria, in 2013, Anyline has developed a range of data capture products such as barcode scanning, optical character recognition (OCR)-powered document scanning, biometric face authentication, serial number scanning, and even driving licensing scanning which enables retailers to easily verify a person's age and identity at the point-of-sale or curbside pickup. Elsewhere, police forces can integrate Anyline's technology to scan all manner of IDs and vehicle license plates to verify drivers instantly, which not only speeds things up but also reduces the chances of errors through traditional manual processes such as typing or broadcasting data across radio. This, according to Anyline CEO and cofounder Lukas Kinigadner, is perhaps the number one benefit Anyline brings to organizations across the spectrum.

anyline, automate mobile data capture, mobile data capture, (7 more...)

#artificialintelligence

Country: Europe > Austria > Vienna (0.59)

Industry: Retail (0.76)

Technology: Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.73)

Add feedback

9 Top AI and Machine Learning Trends for 2021

#artificialintelligenceJun-30-2021, 20:18:34 GMT

AI is getting better at supporting multiple modalities within a single ML model, such as text, vision, speech and IoT sensor data. Developers are starting to find innovative ways to combine modalities to improve common tasks like document understanding, said David Talby, founder and CTO of John Snow Labs, an NLP tools provider. For example, patient data collected and processed by healthcare systems can include visual lab results, genetic sequencing reports, clinical trial forms and other scanned documents. The layout and presentation style of this information, if done right, can help doctors better understand what they're looking at. AI algorithms trained using multi-modal techniques such as machine vison and optical character recognition could optimize the presentation of results, improving medical diagnosis.

ai and machine learning trend, modality, top ai

#artificialintelligence

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.92)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.67)

Add feedback

Top Data Science News of the Week

#artificialintelligenceJun-30-2021, 13:16:10 GMT

Synechron, a leading digital transformation consulting firm launched an annual report, Top Strategic Technology Trends. The report noted data science as one of its eight major trends for 2021, and the company's experts put our three critical trends. The first trend talks about the business applications of self-supervised models, where AI teaches itself to solve problems without human classification of data. The second trend refers to the increased adoption of the Natural Language Generation that uses AI to create several hand-produced documents that are needed every day. The third and final trend is concerned with technologies like ML, Optical Character Recognition, and NLP that will increase efficiency, reduce costs, and detect financial crimes during KYC.

data science, data science news, top data science news, (9 more...)

#artificialintelligence

Country:

North America > United States > Florida > Miami-Dade County (0.05)
North America > United States > California > San Diego County > San Diego (0.05)
Asia > India (0.05)

Genre: Research Report (0.36)

Industry: Education > Educational Setting (0.33)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.56)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.56)

Add feedback

Official - OCR - Convert image to text

#artificialintelligenceJun-27-2021, 11:00:44 GMT

Marks-Man submitted a new resource: [TheJavaSea] OCR - Convert image to text - Perform basic OCR (Optical Character Recognition) in English and 100+ other...

ocr, starter mark-man start date thursday

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.53)

Add feedback

How To Capitalize Words Using AI

#artificialintelligenceJun-20-2021, 08:40:24 GMT

Have you ever faced a large corpus of text missing capitalization of words? You required to uppercase thousand of words before publishing the text. In this post, I demonstrate how to repair case information in documents automatically. Truecasing is a natural language processing problem of finding the proper capitalization of words within a text where such information is unavailable. Use cases include transcripts from various audio sources, automatic speech recognition, optical character recognition, medical records, online messaging, and gaming.

capitalization, capitalize word, recognition, (2 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.62)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.62)

Add feedback

Tag, Copy or Predict: A Unified Weakly-Supervised Learning Framework for Visual Information Extraction using Sequences

Wang, Jiapeng, Wang, Tianwei, Tang, Guozhi, Jin, Lianwen, Ma, Weihong, Ding, Kai, Huang, Yichao

arXiv.org Artificial IntelligenceJun-20-2021

Visual information extraction (VIE) has attracted increasing attention in recent years. The existing methods usually first organized optical character recognition (OCR) results into plain texts and then utilized token-level entity annotations as supervision to train a sequence tagging model. However, it expends great annotation costs and may be exposed to label confusion, and the OCR errors will also significantly affect the final performance. In this paper, we propose a unified weakly-supervised learning framework called TCPN (Tag, Copy or Predict Network), which introduces 1) an efficient encoder to simultaneously model the semantic and layout information in 2D OCR results; 2) a weakly-supervised training strategy that utilizes only key information sequences as supervision; and 3) a flexible and switchable decoder which contains two inference modes: one (Copy or Predict Mode) is to output key information sequences of different categories by copying a token from the input or predicting one in each time step, and the other (Tag Mode) is to directly tag the input sequence in a single forward pass. Our method shows new state-of-the-art performance on several public benchmarks, which fully proves its effectiveness.

decoder, sequence, textlattice, (14 more...)

arXiv.org Artificial Intelligence

2106.10681

Country:

Europe > Middle East > Malta > Port Region > Southern Harbour District > Floriana (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.86)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.68)
Information Technology > Data Science > Data Mining > Text Mining (0.62)

Add feedback

Best OCR by Text Extraction Accuracy in 2021

#artificialintelligenceJun-9-2021, 07:55:34 GMT

Optical Character Recognition (OCR) is a field of machine learning that is specialized in distinguishing characters within images like scanned documents, printed books, or photos. Although it is a mature technology, there are still no OCR products that can recognize all kinds of text 100% accurately. Among the products that we benchmarked, only a few products could output successful results from our test set. OCR tools are used by companies to identify texts and their positions in images, classify business documents according to subjects, or conduct key-value pairing within documents. Based on OCR results, other technology companies build applications like document automation. For all these business cases, accurate text recognition is critical for an OCR product.

accuracy, benchmark, text accuracy, (11 more...)

#artificialintelligence

Genre: Research Report > New Finding (0.30)

Industry: Information Technology (0.90)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.90)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)

Add feedback

AI In Oil And Gas, Unlocking The Value Of Data - AI Summary

#artificialintelligenceMay-30-2021, 07:20:18 GMT

Daniel Faggella: So, Lorena, I want to be able to dive into these various use cases of how artificial intelligence can start to unlock the value of data in the oil and gas space, and make this really tangible. I know the first category we wanted to talk about was really around the value of subsurface data, that there's a lot of subsurface data, obviously in the oil and oil and gas domain. Lorena Pelegrín: And we see that AI or our ML can help these teams find the data and process the data much, much faster. Yeah, and I imagine a good deal of this has to do with, tell me if I'm wrong here, Lorena, but having an understanding of your company from working with you guys for a little while, I would imagine that the digitization of these myriad, somewhat chunky paper forms is one part of the process here, using some kind of optical character recognition stuff and working with historical records and maybe congealing and digitizing that. Daniel Faggella: But you let me know, Lorena, where does M&A, where does this data come in, in terms of the real value for potential M&A? Daniel Faggella: So Drone Deploy, for example, was on talking about what they do in the energy space with drones and video data to look at and inspect assets.

daniel faggella, lorena pelegrín, optical character recognition, (16 more...)

#artificialintelligence

Industry: Energy > Oil & Gas (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.58)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.38)

Add feedback