Our second example deals with a more challenging problem: the recognition of hand-printed letters of the alphabet. The characters that people print in the ordinary course of filling out forms and questionnaires are surprisingly varied. Gaps abound wherecontinuous lines might be expected; curves and sharp angles appear interchangeably; there is almost every imaginable distortion of slant, shape and size. Even human readers cannot always identify such characters; their error rate is about 3 per cent on randomly selected letters and numbers, seen out of context.
– from Oliver G. Selfridge & Ulric Neisser. PATTERN RECOGNITION BY MACHINE . In Computers & thought, Edward A. Feigenbaum and Julian Feldman (Eds.). MIT Press, Cambridge, MA, USA, 1963. pp. 8-30.
OpenText, a global leader in Enterprise Information Management (EIM), announced Univar Solutions EMEA, a leading distributor of chemical ingredients and services in Europe, is working with OpenText Professional Services to upgrade their deployment of OpenText Vendor Invoice Management for SAP Solutions to further transform its accounts payable operations with new AI, intelligent capture and automation capabilities. OpenText Vendor Invoice Management for SAP routes invoices automatically to the right person for resolution, approval and payment. New enhancements to the solution will boost Univar Solutions EMEA's operations by giving the company access to OCR line item recognition, improving invoice training and automating previous manual freight processing and costing. "Deep integration between OpenText and SAP is helping us continuously streamline our accounts payable processes, while continuing to find productivity gains through automation and innovation," said Brian Morgan, IT director EMEA, Univar. "We are working with OpenText Professional Services to take advantage of new capabilities in AI and process automation, ensuring that our people are focused on the customer-facing work which matters most to our business. Powerful optical character recognition combined with machine learning and intelligent automation enables content to be matched against supplier delivery notes. This helps Univar Solutions EMEA continuously identify and remove bottlenecks and automatically correct errors or inefficiencies before they impact customer satisfaction. Advanced analytics and reporting tools give Univar Solutions EMEA greater visibility over its accounts payable processes, helping ensure governance, compliance and clarity. "OpenText helps companies connect business applications, digital business processes and proprietary company content.
Crowdsourcing systems, in which tasks are electronically distributed to numerous information piece-workers'', have emerged as an effective paradigm for human-powered solving of large scale problems in domains such as image classification, data entry, optical character recognition, recommendation, and proofreading. Because these low-paid workers can be unreliable, nearly all crowdsourcers must devise schemes to increase confidence in their answers, typically by assigning each task multiple times and combining the answers in some way such as majority voting. In this paper, we consider a general model of such rowdsourcing tasks, and pose the problem of minimizing the total price (i.e., number of task assignments) that must be paid to achieve a target overall reliability. We give new algorithms for deciding which tasks to assign to which workers and for inferring correct answers from the workers' answers. We show that our algorithm significantly outperforms majority voting and, in fact, are asymptotically optimal through comparison to an oracle that knows the reliability of every worker.
We introduce a large-volume box classification for binary prediction, which maintains a subset of weight vectors, and specifically axis-aligned boxes. Our learning algorithm seeks for a box of large volume that contains simple'' weight vectors which most of are accurate on the training set. Two versions of the learning process are cast as convex optimization problems, and it is shown how to solve them efficiently. The formulation yields a natural PAC-Bayesian performance bound and it is shown to minimize a quantity directly aligned with it. The algorithm outperforms SVM and the recently proposed AROW algorithm on a majority of $30$ NLP datasets and binarized USPS optical character recognition datasets.
Analytic shrinkage is a statistical technique that offers a fast alternative to cross-validation for the regularization of covariance matrices and has appealing consistency properties. We show that the proof of consistency implies bounds on the growth rates of eigenvalues and their dispersion, which are often violated in data. We prove consistency under assumptions which do not restrict the covariance structure and therefore better match real world data. In addition, we propose an extension of analytic shrinkage --orthogonal complement shrinkage-- which adapts to the covariance structure. Finally we demonstrate the superior performance of our novel approach on data from the domains of finance, spoken letter and optical character recognition, and neuroscience.
In user-facing applications, displaying calibrated confidence measures---probabilities that correspond to true frequency---can be as important as obtaining high accuracy. We are interested in calibration for structured prediction problems such as speech recognition, optical character recognition, and medical diagnosis. Structured prediction presents new challenges for calibration: the output space is large, and users may issue many types of probability queries (e.g., marginals) on the structured output. We extend the notion of calibration so as to handle various subtleties pertaining to the structured setting, and then provide a simple recalibration method that trains a binary classifier to predict probabilities of interest. We explore a range of features appropriate for structured recalibration, and demonstrate their efficacy on three real-world datasets.
Neural network based end-to-end text to speech (TTS) has significantly improved the quality of synthesized speech. Prominent methods (e.g., Tacotron 2) usually first generate mel-spectrogram from text, and then synthesize speech from the mel-spectrogram using vocoder such as WaveNet. Compared with traditional concatenative and statistical parametric approaches, neural network based end-to-end models suffer from slow inference speed, and the synthesized speech is usually not robust (i.e., some words are skipped or repeated) and lack of controllability (voice speed or prosody control). In this work, we propose a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS. Specifically, we extract attention alignments from an encoder-decoder based teacher model for phoneme duration prediction, which is used by a length regulator to expand the source phoneme sequence to match the length of the target mel-spectrogram sequence for parallel mel-spectrogram generation.
We introduce a technique for augmenting neural text-to-speech (TTS) with low-dimensional trainable speaker embeddings to generate different voices from a single model. As a starting point, we show improvements over the two state-of-the-art approaches for single-speaker neural TTS: Deep Voice 1 and Tacotron. We introduce Deep Voice 2, which is based on a similar pipeline with Deep Voice 1, but constructed with higher performance building blocks and demonstrates a significant audio quality improvement over Deep Voice 1. We improve Tacotron by introducing a post-processing neural vocoder, and demonstrate a significant audio quality improvement. We then demonstrate our technique for multi-speaker speech synthesis for both Deep Voice 2 and Tacotron on two multi-speaker TTS datasets.
Optical character recognition (OCR), or the conversion of images of handwritten or printed text into machine-readable text, is a science that dates back to the early '70s. But algorithms have long struggled to make out characters that aren't parallel with horizontal planes, which is why researchers at Amazon developed what they call TextTubes. They're detectors for curved text in natural images that model said text as tubes around their medial (middle) axes. In a paper describing their work, the coauthors claim that their approach achieves state-of-the-art results on a popular OCR benchmark. As the researchers explain, scene text is typically broken down into two successive tasks: text detection and text recognition.
Does your OCR technology make sense of the data that is extracted? Traditional OCR technology provides less accuracy as it does not understand what is being extracted and hence a considerable amount of errors occur. To remove such errors it needs manual fixing which is time-consuming and will require significant resources. The AI-powered Infrrd OCR removes all such difficulties by implementing machine learning algorithms to understand the data that has been extracted and improves the output automatically. When it comes to choosing an OCR app, accuracy is one of the most important criteria.
Text to speech (TTS) has attracted a lot of attention recently due to advancements in deep learning. Neural network-based TTS models (such as Tacotron 2, DeepVoice 3 and Transformer TTS) have outperformed conventional concatenative and statistical parametric approaches in terms of speech quality. Neural network-based TTS models usually first generate a mel-scale spectrogram (or mel-spectrogram) autoregressively from text input and then synthesize speech from the mel-spectrogram using a vocoder. A spectrogram is a visual representation of frequencies measured over time.) To address the above problems, researchers from Microsoft and Zhejiang University propose FastSpeech, a novel feed-forward network that generates mel-spectrograms with fast generation speed, robustness, controllability, and high quality.