Our second example deals with a more challenging problem: the recognition of hand-printed letters of the alphabet. The characters that people print in the ordinary course of filling out forms and questionnaires are surprisingly varied. Gaps abound wherecontinuous lines might be expected; curves and sharp angles appear interchangeably; there is almost every imaginable distortion of slant, shape and size. Even human readers cannot always identify such characters; their error rate is about 3 per cent on randomly selected letters and numbers, seen out of context.
– from Oliver G. Selfridge & Ulric Neisser. PATTERN RECOGNITION BY MACHINE . In Computers & thought, Edward A. Feigenbaum and Julian Feldman (Eds.). MIT Press, Cambridge, MA, USA, 1963. pp. 8-30.
The dynamic design duo of Karen Ann Donnachie and Andy Simionato are set to receive the Tokyo Type Directors Club award for their AI reading machine project – a machine that essentially transforms books into short Haikus accompanied by related images. It does this by using computer vision and optical character recognition to'read' books. Then with machine learning and natural language processing, it selects a poetic combination of words while erasing the rest to form an artsy-looking Haiku. While doing this, the reading machine also using Google to search up images that relate to said words. Donnachie and Simionato have released a series of books that we know and love with a slight twist.
Crowdsourcing systems, in which tasks are electronically distributed to numerous information piece-workers'', have emerged as an effective paradigm for human-powered solving of large scale problems in domains such as image classification, data entry, optical character recognition, recommendation, and proofreading. Because these low-paid workers can be unreliable, nearly all crowdsourcers must devise schemes to increase confidence in their answers, typically by assigning each task multiple times and combining the answers in some way such as majority voting. In this paper, we consider a general model of such rowdsourcing tasks, and pose the problem of minimizing the total price (i.e., number of task assignments) that must be paid to achieve a target overall reliability. We give new algorithms for deciding which tasks to assign to which workers and for inferring correct answers from the workers' answers. We show that our algorithm significantly outperforms majority voting and, in fact, are asymptotically optimal through comparison to an oracle that knows the reliability of every worker.
We introduce a large-volume box classification for binary prediction, which maintains a subset of weight vectors, and specifically axis-aligned boxes. Our learning algorithm seeks for a box of large volume that contains simple'' weight vectors which most of are accurate on the training set. Two versions of the learning process are cast as convex optimization problems, and it is shown how to solve them efficiently. The formulation yields a natural PAC-Bayesian performance bound and it is shown to minimize a quantity directly aligned with it. The algorithm outperforms SVM and the recently proposed AROW algorithm on a majority of $30$ NLP datasets and binarized USPS optical character recognition datasets.
Analytic shrinkage is a statistical technique that offers a fast alternative to cross-validation for the regularization of covariance matrices and has appealing consistency properties. We show that the proof of consistency implies bounds on the growth rates of eigenvalues and their dispersion, which are often violated in data. We prove consistency under assumptions which do not restrict the covariance structure and therefore better match real world data. In addition, we propose an extension of analytic shrinkage --orthogonal complement shrinkage-- which adapts to the covariance structure. Finally we demonstrate the superior performance of our novel approach on data from the domains of finance, spoken letter and optical character recognition, and neuroscience.
In user-facing applications, displaying calibrated confidence measures---probabilities that correspond to true frequency---can be as important as obtaining high accuracy. We are interested in calibration for structured prediction problems such as speech recognition, optical character recognition, and medical diagnosis. Structured prediction presents new challenges for calibration: the output space is large, and users may issue many types of probability queries (e.g., marginals) on the structured output. We extend the notion of calibration so as to handle various subtleties pertaining to the structured setting, and then provide a simple recalibration method that trains a binary classifier to predict probabilities of interest. We explore a range of features appropriate for structured recalibration, and demonstrate their efficacy on three real-world datasets.
We introduce a technique for augmenting neural text-to-speech (TTS) with low-dimensional trainable speaker embeddings to generate different voices from a single model. As a starting point, we show improvements over the two state-of-the-art approaches for single-speaker neural TTS: Deep Voice 1 and Tacotron. We introduce Deep Voice 2, which is based on a similar pipeline with Deep Voice 1, but constructed with higher performance building blocks and demonstrates a significant audio quality improvement over Deep Voice 1. We improve Tacotron by introducing a post-processing neural vocoder, and demonstrate a significant audio quality improvement. We then demonstrate our technique for multi-speaker speech synthesis for both Deep Voice 2 and Tacotron on two multi-speaker TTS datasets.
Utopia Global, Inc., a leading global data solutions company known for its end-to-end data quality, data migration, and data governance software solutions, has announced the launch of a new cloud based intelligent software platform: Intelligent Data Capture and Control (IDCC). IDCC is Utopia's new cloud-based master data enrichment and governance solution. It provides asset-intensive organizations with an automated, easily deployable suite to rapidly improve the quality of asset master data in their SAP or non-SAP maintenance systems of record. "Utopia is thrilled to continue our co-innovation with the SAP Asset Management team, IDCC being our newest contribution to the SAP Intelligent Asset Management solution. This release of IDCC provides access to our robust machine learning engine for creating high quality material and asset master data from multiple sources, including unstructured content," said Arvind J. Singh, Chairman and CEO of Utopia Global, Inc. IDCC uniquely leverages optical character recognition, Utopia's advanced machine learning code, intelligent online web search, and document search.
OpenText, a global leader in Enterprise Information Management (EIM), announced Univar Solutions EMEA, a leading distributor of chemical ingredients and services in Europe, is working with OpenText Professional Services to upgrade their deployment of OpenText Vendor Invoice Management for SAP Solutions to further transform its accounts payable operations with new AI, intelligent capture and automation capabilities. OpenText Vendor Invoice Management for SAP routes invoices automatically to the right person for resolution, approval and payment. New enhancements to the solution will boost Univar Solutions EMEA's operations by giving the company access to OCR line item recognition, improving invoice training and automating previous manual freight processing and costing. "Deep integration between OpenText and SAP is helping us continuously streamline our accounts payable processes, while continuing to find productivity gains through automation and innovation," said Brian Morgan, IT director EMEA, Univar. "We are working with OpenText Professional Services to take advantage of new capabilities in AI and process automation, ensuring that our people are focused on the customer-facing work which matters most to our business. Powerful optical character recognition combined with machine learning and intelligent automation enables content to be matched against supplier delivery notes. This helps Univar Solutions EMEA continuously identify and remove bottlenecks and automatically correct errors or inefficiencies before they impact customer satisfaction. Advanced analytics and reporting tools give Univar Solutions EMEA greater visibility over its accounts payable processes, helping ensure governance, compliance and clarity. "OpenText helps companies connect business applications, digital business processes and proprietary company content.
Neural network based end-to-end text to speech (TTS) has significantly improved the quality of synthesized speech. Prominent methods (e.g., Tacotron 2) usually first generate mel-spectrogram from text, and then synthesize speech from the mel-spectrogram using vocoder such as WaveNet. Compared with traditional concatenative and statistical parametric approaches, neural network based end-to-end models suffer from slow inference speed, and the synthesized speech is usually not robust (i.e., some words are skipped or repeated) and lack of controllability (voice speed or prosody control). In this work, we propose a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS. Specifically, we extract attention alignments from an encoder-decoder based teacher model for phoneme duration prediction, which is used by a length regulator to expand the source phoneme sequence to match the length of the target mel-spectrogram sequence for parallel mel-spectrogram generation.
Optical character recognition (OCR), or the conversion of images of handwritten or printed text into machine-readable text, is a science that dates back to the early '70s. But algorithms have long struggled to make out characters that aren't parallel with horizontal planes, which is why researchers at Amazon developed what they call TextTubes. They're detectors for curved text in natural images that model said text as tubes around their medial (middle) axes. In a paper describing their work, the coauthors claim that their approach achieves state-of-the-art results on a popular OCR benchmark. As the researchers explain, scene text is typically broken down into two successive tasks: text detection and text recognition.