Optical Character Recognition


FastSpeech: Fast, Robust and Controllable Text to Speech

Neural Information Processing Systems

Neural network based end-to-end text to speech (TTS) has significantly improved the quality of synthesized speech. Prominent methods (e.g., Tacotron 2) usually first generate mel-spectrogram from text, and then synthesize speech from the mel-spectrogram using vocoder such as WaveNet. Compared with traditional concatenative and statistical parametric approaches, neural network based end-to-end models suffer from slow inference speed, and the synthesized speech is usually not robust (i.e., some words are skipped or repeated) and lack of controllability (voice speed or prosody control). In this work, we propose a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS. Specifically, we extract attention alignments from an encoder-decoder based teacher model for phoneme duration prediction, which is used by a length regulator to expand the source phoneme sequence to match the length of the target mel-spectrogram sequence for parallel mel-spectrogram generation.


Utopia Global Releases Cloud-Based Intelligent Data Capture and Control Software Platform Delivers High Quality Enriched Asset Master Data Leveraging Machine Learning

#artificialintelligence

IDCC uniquely leverages optical character recognition, Utopia's advanced machine learning code, intelligent online web search, and document search. Beginning simply with only a photo of a manufacturer's nameplate, IDCC can produce complete and accurate material and asset information. Manufacturer and model data is organized in ISO-14224 standards and can be delivered via a variety of easy-to-integrate methods, including SAP Asset Intelligence Network . The cloud-based nature of IDCC enables cost-effective, rapid deployments by large and small organizations alike. IDCC can be deployed in pure cloud environments, such as SAP Intelligent Asset Management, or hybrid deployments using SAP Master Data Governance, enterprise asset management extension by Utopia.


Introducing The AI Reading Machine That Reconstructs Books As Illustrated Haikus

#artificialintelligence

The dynamic design duo of Karen Ann Donnachie and Andy Simionato are set to receive the Tokyo Type Directors Club award for their AI reading machine project – a machine that essentially transforms books into short Haikus accompanied by related images. It does this by using computer vision and optical character recognition to'read' books. Then with machine learning and natural language processing, it selects a poetic combination of words while erasing the rest to form an artsy-looking Haiku. While doing this, the reading machine also using Google to search up images that relate to said words. Donnachie and Simionato have released a series of books that we know and love with a slight twist.


Iterative Learning for Reliable Crowdsourcing Systems

Neural Information Processing Systems

Crowdsourcing systems, in which tasks are electronically distributed to numerous information piece-workers'', have emerged as an effective paradigm for human-powered solving of large scale problems in domains such as image classification, data entry, optical character recognition, recommendation, and proofreading. Because these low-paid workers can be unreliable, nearly all crowdsourcers must devise schemes to increase confidence in their answers, typically by assigning each task multiple times and combining the answers in some way such as majority voting. In this paper, we consider a general model of such rowdsourcing tasks, and pose the problem of minimizing the total price (i.e., number of task assignments) that must be paid to achieve a target overall reliability. We give new algorithms for deciding which tasks to assign to which workers and for inferring correct answers from the workers' answers. We show that our algorithm significantly outperforms majority voting and, in fact, are asymptotically optimal through comparison to an oracle that knows the reliability of every worker.


Volume Regularization for Binary Classification

Neural Information Processing Systems

We introduce a large-volume box classification for binary prediction, which maintains a subset of weight vectors, and specifically axis-aligned boxes. Our learning algorithm seeks for a box of large volume that contains simple'' weight vectors which most of are accurate on the training set. Two versions of the learning process are cast as convex optimization problems, and it is shown how to solve them efficiently. The formulation yields a natural PAC-Bayesian performance bound and it is shown to minimize a quantity directly aligned with it. The algorithm outperforms SVM and the recently proposed AROW algorithm on a majority of $30$ NLP datasets and binarized USPS optical character recognition datasets.


Generalizing Analytic Shrinkage for Arbitrary Covariance Structures

Neural Information Processing Systems

Analytic shrinkage is a statistical technique that offers a fast alternative to cross-validation for the regularization of covariance matrices and has appealing consistency properties. We show that the proof of consistency implies bounds on the growth rates of eigenvalues and their dispersion, which are often violated in data. We prove consistency under assumptions which do not restrict the covariance structure and therefore better match real world data. In addition, we propose an extension of analytic shrinkage --orthogonal complement shrinkage-- which adapts to the covariance structure. Finally we demonstrate the superior performance of our novel approach on data from the domains of finance, spoken letter and optical character recognition, and neuroscience.


Calibrated Structured Prediction

Neural Information Processing Systems

In user-facing applications, displaying calibrated confidence measures---probabilities that correspond to true frequency---can be as important as obtaining high accuracy. We are interested in calibration for structured prediction problems such as speech recognition, optical character recognition, and medical diagnosis. Structured prediction presents new challenges for calibration: the output space is large, and users may issue many types of probability queries (e.g., marginals) on the structured output. We extend the notion of calibration so as to handle various subtleties pertaining to the structured setting, and then provide a simple recalibration method that trains a binary classifier to predict probabilities of interest. We explore a range of features appropriate for structured recalibration, and demonstrate their efficacy on three real-world datasets.


Deep Voice 2: Multi-Speaker Neural Text-to-Speech

Neural Information Processing Systems

We introduce a technique for augmenting neural text-to-speech (TTS) with low-dimensional trainable speaker embeddings to generate different voices from a single model. As a starting point, we show improvements over the two state-of-the-art approaches for single-speaker neural TTS: Deep Voice 1 and Tacotron. We introduce Deep Voice 2, which is based on a similar pipeline with Deep Voice 1, but constructed with higher performance building blocks and demonstrates a significant audio quality improvement over Deep Voice 1. We improve Tacotron by introducing a post-processing neural vocoder, and demonstrate a significant audio quality improvement. We then demonstrate our technique for multi-speaker speech synthesis for both Deep Voice 2 and Tacotron on two multi-speaker TTS datasets.


Utopia Global Releases Intelligent Data Capture and Control Software

#artificialintelligence

Utopia Global, Inc., a leading global data solutions company known for its end-to-end data quality, data migration, and data governance software solutions, has announced the launch of a new cloud based intelligent software platform: Intelligent Data Capture and Control (IDCC). IDCC is Utopia's new cloud-based master data enrichment and governance solution. It provides asset-intensive organizations with an automated, easily deployable suite to rapidly improve the quality of asset master data in their SAP or non-SAP maintenance systems of record. "Utopia is thrilled to continue our co-innovation with the SAP Asset Management team, IDCC being our newest contribution to the SAP Intelligent Asset Management solution. This release of IDCC provides access to our robust machine learning engine for creating high quality material and asset master data from multiple sources, including unstructured content," said Arvind J. Singh, Chairman and CEO of Utopia Global, Inc. IDCC uniquely leverages optical character recognition, Utopia's advanced machine learning code, intelligent online web search, and document search.


Univar Solutions Emea Leverages OpenText Enhancements to Operations

#artificialintelligence

OpenText, a global leader in Enterprise Information Management (EIM), announced Univar Solutions EMEA, a leading distributor of chemical ingredients and services in Europe, is working with OpenText Professional Services to upgrade their deployment of OpenText Vendor Invoice Management for SAP Solutions to further transform its accounts payable operations with new AI, intelligent capture and automation capabilities. OpenText Vendor Invoice Management for SAP routes invoices automatically to the right person for resolution, approval and payment. New enhancements to the solution will boost Univar Solutions EMEA's operations by giving the company access to OCR line item recognition, improving invoice training and automating previous manual freight processing and costing. "Deep integration between OpenText and SAP is helping us continuously streamline our accounts payable processes, while continuing to find productivity gains through automation and innovation," said Brian Morgan, IT director EMEA, Univar. "We are working with OpenText Professional Services to take advantage of new capabilities in AI and process automation, ensuring that our people are focused on the customer-facing work which matters most to our business. Powerful optical character recognition combined with machine learning and intelligent automation enables content to be matched against supplier delivery notes. This helps Univar Solutions EMEA continuously identify and remove bottlenecks and automatically correct errors or inefficiencies before they impact customer satisfaction. Advanced analytics and reporting tools give Univar Solutions EMEA greater visibility over its accounts payable processes, helping ensure governance, compliance and clarity. "OpenText helps companies connect business applications, digital business processes and proprietary company content.