AITopics | Optical Character Recognition

Collaborating Authors

Optical Character Recognition

Our second example deals with a more challenging problem: the recognition of hand-printed letters of the alphabet. The characters that people print in the ordinary course of filling out forms and questionnaires are surprisingly varied. Gaps abound wherecontinuous lines might be expected; curves and sharp angles appear interchangeably; there is almost every imaginable distortion of slant, shape and size. Even human readers cannot always identify such characters; their error rate is about 3 per cent on randomly selected letters and numbers, seen out of context.
– from Oliver G. Selfridge & Ulric Neisser. PATTERN RECOGNITION BY MACHINE . In Computers & thought, Edward A. Feigenbaum and Julian Feldman (Eds.). MIT Press, Cambridge, MA, USA, 1963. pp. 8-30.

News Overviews Instructional Materials AI-Alerts Classics

DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents

Dhouib, Mohamed, Bettaieb, Ghassen, Shabou, Aymen

arXiv.org Artificial IntelligenceMay-1-2023

Information Extraction from visually rich documents is a challenging task that has gained a lot of attention in recent years due to its importance in several document-control based applications and its widespread commercial value. The majority of the research work conducted on this topic to date follow a two-step pipeline. First, they read the text using an off-the-shelf Optical Character Recognition (OCR) engine, then, they extract the fields of interest from the obtained text. The main drawback of these approaches is their dependence on an external OCR system, which can negatively impact both performance and computational speed. Recent OCR-free methods were proposed to address the previous issues. Inspired by their promising results, we propose in this paper an OCR-free end-to-end information extraction model named DocParser. It differs from prior end-to-end approaches by its ability to better extract discriminative character features. DocParser achieves state-of-the-art results on various datasets, while still being faster than previous works.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2304.12484

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Switzerland > Vaud > Lausanne (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Mining > Text Mining (0.84)

Add feedback

DiffVoice: Text-to-Speech with Latent Diffusion

Liu, Zhijun, Guo, Yiwei, Yu, Kai

arXiv.org Artificial IntelligenceApr-23-2023

In this work, we present DiffVoice, a novel text-to-speech model based on latent diffusion. We propose to first encode speech signals into a phoneme-rate latent representation with a variational autoencoder enhanced by adversarial training, and then jointly model the duration and the latent representation with a diffusion model. Subjective evaluations on LJSpeech and LibriTTS datasets demonstrate that our method beats the best publicly available systems in naturalness. By adopting recent generative inverse problem solving algorithms for diffusion models, DiffVoice achieves the state-of-the-art performance in text-based speech editing, and zero-shot adaptation.

artificial intelligence, machine learning, optical character recognition, (14 more...)

arXiv.org Artificial Intelligence

2304.1175

Country: Asia > China > Shanghai > Shanghai (0.05)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.64)

Add feedback

What's new in Swift Release 23 in January? – Ephlux

#artificialintelligenceApr-18-2023, 13:16:38 GMT

With this release of Swift, users may now benefit from the latest menu designer functionality, which makes it simple to further refine the menu to improve findability and provide a uniform experience across devices. Simply open your Swift App Designer Studio, click the Edit icon, scroll down, and tap the "click to configure" option. The menu Designer will display after you tap on it, and all you have to do is drag and drop items and change the names of your sub-menus according to your preference. This release of Swift, introduces the OCR functionality. Using the OCR control, end-users can convert the images into text using Optical Character Recognition and save time and increase accuracy in data entry.

functionality, no-code platform, swift release 23, (7 more...)

#artificialintelligence

Industry: Information Technology (0.33)

Technology: Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.57)

Add feedback

VALL-E -- The Future of Text to Speech?

#artificialintelligenceApr-14-2023, 18:05:18 GMT

In this article, we will dive deep into a new and exciting text-to-speech model developed by Microsoft Research, called VALL-E. The paper presenting the work has been released on Jan. 5, 2023, and since then has been gaining much attention online. It is worth noting that as of writing this article, no pre-trained model has been released and the only option currently to battle-test this model is to train it by yourself. Nevertheless, the idea presented in this paper is novel and interesting and worth digging into, regardless of whether I can immediately clone my voice with it or not. The technology of text-to-speech is not new and has been around since the "Voder" -- the first electronic voice synthesizer from Bell Labs in 1939 which required manual operation.

codebook, representation, vall-e, (16 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Adaptive Elastic Models for Hand-Printed Character Recognition

Neural Information Processing SystemsApr-6-2023, 19:27:06 GMT

Hand-printed digits can be modeled as splines that are governed by about 8 control points. Images of digits can be produced by placing Gaussian ink generators uniformly along the spline. Real images can be recognized by finding the digit model most likely to have generated the data. For each digit model we use an elastic matching algorithm to minimize an energy function that includes both the defor(cid:173) mation energy of the digit model and the log probability that the model would generate the inked pixels in the image. If a uniform noise process is included in the model of image generation, some of the inked pixels can be rejected as noise as a digit model is fitting a poorly segmented image.

adaptive elastic model, digit model, hand-printed character recognition, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.40)

Add feedback

Improving Performance in Neural Networks Using a Boosting Algorithm

Neural Information Processing SystemsApr-6-2023, 19:11:21 GMT

A boosting algorithm converts a learning machine with error rate less than 50% to one with an arbitrarily low error rate. However, the algorithm discussed here depends on having a large supply of independent training samples. We show how to circumvent this problem and generate an ensemble of learning machines whose performance in optical character recognition problems is dramatically improved over that of a single network. We report the effect of boosting on four databases (all handwritten) consisting of 12,000 digits from segmented ZIP codes from the United State Postal Service (USPS) and the following from the National Institute of Standards and Testing (NIST): 220,000 digits, 45,000 upper case alphas, and 45,000 lower case alphas. We use two performance measures: the raw error rate (no rejects) and the reject rate required to achieve a 1% error rate on the patterns not rejected. Boosting improved performance in some cases by a factor of three.

algorithm, error rate, neural network, (2 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (0.65)
Government > Post Office (0.65)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback

Learning to See Where and What: Training a Net to Make Saccades and Recognize Handwritten Characters

Neural Information Processing SystemsApr-6-2023, 19:08:41 GMT

The approach, called Saccade, integrates ballistic and corrective saccades (eye movements) with character recognition. A single backpropagation net is trained to make a classification decision on a character centered in its input window, as well as to estimate the distance of the current and next character from the center of the input window. The net learns to accurately estimate these distances regardless of variations in character width, spacing between characters, writing style and other factors. During testing, the system uses the net xtracted classification and distance information, along with a set of jumping rules, to jump from character to character. The ability to read rests on multiple foundation skills.

make saccade, saccade and recognize handwritten character, visual field, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.58)

Add feedback

Planar Hidden Markov Modeling: From Speech to Optical Character Recognition

Neural Information Processing SystemsApr-6-2023, 19:03:55 GMT

We propose in this paper a statistical model (planar hidden Markov model - PHMM) describing statistical properties of images. The model generalizes the single-dimensional HMM, used for speech processing, to the planar case. For this model to be useful an efficient segmentation algorithm, similar to the Viterbi algorithm for HMM, must exist We present conditions in terms of the PHMM parameters that are sufficient to guarantee that the planar segmentation problem can be solved in polynomial time, and describe an algorithm for that. This algorithm aligns optimally the image with the model, and therefore is insensitive to elastic distortions of images. Using this algorithm a joint optima1 segmentation and recognition of the image can be performed, thus overcoming the weakness of traditional OCR systems where segmentation is performed independently before the recognition leading to unrecoverable recognition errors.

algorithm, planar hidden markov modeling, recognition, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Transformation Invariant Autoassociation with Application to Handwritten Character Recognition

Neural Information Processing SystemsApr-6-2023, 18:44:05 GMT

When training neural networks by the classical backpropagation algo(cid:173) rithm the whole problem to learn must be expressed by a set of inputs and desired outputs. However, we often have high-level knowledge about the learning problem. In optical character recognition (OCR), for in(cid:173) stance, we know that the classification should be invariant under a set of transformations like rotation or translation. We propose a new modular classification system based on several autoassociative multilayer percep(cid:173) trons which allows the efficient incorporation of such knowledge. Results are reported on the NIST database of upper case handwritten letters and compared to other approaches to the invariance problem.

handwritten character recognition, knowledge, transformation invariant autoassociation, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning Prototype Models for Tangent Distance

Neural Information Processing SystemsApr-6-2023, 18:36:36 GMT

Simard, LeCun & Denker (1993) showed that the performance of nearest-neighbor classification schemes for handwritten character recognition can be improved by incorporating invariance to spe(cid:173) the so cific transformations in the underlying distance metric - called tangent distance. The resulting classifier, however, can be prohibitively slow and memory intensive due to the large amount of prototypes that need to be stored and used in the distance compar(cid:173) isons. In this paper we develop rich models for representing large subsets of the prototypes. These models are either used singly per class, or as basic building blocks in conjunction with the K-means clustering algorithm.

learning prototype model, tangent distance, trevor hastie

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback