Optical Character Recognition
Introducing The AI Reading Machine That Reconstructs Books As Illustrated Haikus
The dynamic design duo of Karen Ann Donnachie and Andy Simionato are set to receive the Tokyo Type Directors Club award for their AI reading machine project – a machine that essentially transforms books into short Haikus accompanied by related images. It does this by using computer vision and optical character recognition to'read' books. Then with machine learning and natural language processing, it selects a poetic combination of words while erasing the rest to form an artsy-looking Haiku. While doing this, the reading machine also using Google to search up images that relate to said words. Donnachie and Simionato have released a series of books that we know and love with a slight twist.
6 strategies for building AI-based software TechBeacon
Developing software that incorporates artificial intelligence (AI) can be unpredictable, and you need a unique set of knowledge and skills to code, test, and make sense of the data. What's more, tuning the system can take time, and the decisions AI-based software makes can sometimes be difficult to explain. My organization specializes in developing software test automation tools that help users develop tests that run on different platforms, such as desktop computers and mobile devices. We wanted to make it even easier to write and run these tests, and avoid having to customize the test for each platform. Our research led to adopt natural-language processing, which allows users of our software to describe a test using simple English, and computer vision with optical character recognition to identify the objects on a screen.
Iterative Learning for Reliable Crowdsourcing Systems
Karger, David R., Oh, Sewoong, Shah, Devavrat
Crowdsourcing systems, in which tasks are electronically distributed to numerous information piece-workers'', have emerged as an effective paradigm for human-powered solving of large scale problems in domains such as image classification, data entry, optical character recognition, recommendation, and proofreading. Because these low-paid workers can be unreliable, nearly all crowdsourcers must devise schemes to increase confidence in their answers, typically by assigning each task multiple times and combining the answers in some way such as majority voting. In this paper, we consider a general model of such rowdsourcing tasks, and pose the problem of minimizing the total price (i.e., number of task assignments) that must be paid to achieve a target overall reliability. We give new algorithms for deciding which tasks to assign to which workers and for inferring correct answers from the workers' answers. We show that our algorithm significantly outperforms majority voting and, in fact, are asymptotically optimal through comparison to an oracle that knows the reliability of every worker.
Volume Regularization for Binary Classification
We introduce a large-volume box classification for binary prediction, which maintains a subset of weight vectors, and specifically axis-aligned boxes. Our learning algorithm seeks for a box of large volume that contains simple'' weight vectors which most of are accurate on the training set. Two versions of the learning process are cast as convex optimization problems, and it is shown how to solve them efficiently. The formulation yields a natural PAC-Bayesian performance bound and it is shown to minimize a quantity directly aligned with it. The algorithm outperforms SVM and the recently proposed AROW algorithm on a majority of $30$ NLP datasets and binarized USPS optical character recognition datasets.
Generalizing Analytic Shrinkage for Arbitrary Covariance Structures
Bartz, Daniel, Müller, Klaus-Robert
Analytic shrinkage is a statistical technique that offers a fast alternative to cross-validation for the regularization of covariance matrices and has appealing consistency properties. We show that the proof of consistency implies bounds on the growth rates of eigenvalues and their dispersion, which are often violated in data. We prove consistency under assumptions which do not restrict the covariance structure and therefore better match real world data. In addition, we propose an extension of analytic shrinkage --orthogonal complement shrinkage-- which adapts to the covariance structure. Finally we demonstrate the superior performance of our novel approach on data from the domains of finance, spoken letter and optical character recognition, and neuroscience.
Calibrated Structured Prediction
Kuleshov, Volodymyr, Liang, Percy S.
In user-facing applications, displaying calibrated confidence measures---probabilities that correspond to true frequency---can be as important as obtaining high accuracy. We are interested in calibration for structured prediction problems such as speech recognition, optical character recognition, and medical diagnosis. Structured prediction presents new challenges for calibration: the output space is large, and users may issue many types of probability queries (e.g., marginals) on the structured output. We extend the notion of calibration so as to handle various subtleties pertaining to the structured setting, and then provide a simple recalibration method that trains a binary classifier to predict probabilities of interest. We explore a range of features appropriate for structured recalibration, and demonstrate their efficacy on three real-world datasets.
Deep Voice 2: Multi-Speaker Neural Text-to-Speech
Gibiansky, Andrew, Arik, Sercan, Diamos, Gregory, Miller, John, Peng, Kainan, Ping, Wei, Raiman, Jonathan, Zhou, Yanqi
We introduce a technique for augmenting neural text-to-speech (TTS) with low-dimensional trainable speaker embeddings to generate different voices from a single model. As a starting point, we show improvements over the two state-of-the-art approaches for single-speaker neural TTS: Deep Voice 1 and Tacotron. We introduce Deep Voice 2, which is based on a similar pipeline with Deep Voice 1, but constructed with higher performance building blocks and demonstrates a significant audio quality improvement over Deep Voice 1. We improve Tacotron by introducing a post-processing neural vocoder, and demonstrate a significant audio quality improvement. We then demonstrate our technique for multi-speaker speech synthesis for both Deep Voice 2 and Tacotron on two multi-speaker TTS datasets.
Natural Language Processing (NLP) Market to Reach USD 80.68 billion by 2026; Increasing Demand for Enhanced Algorithms to Boost Growth, says Fortune Business Insights
Key Companies Covered in NLP Market Research Report are 3M Company, Adobe Systems Inc., Amazon Web Services Inc., Apple Inc., Google (Alphabet Inc.), Hewlett-Packard Enterprise Company, Intel Corporation, Microsoft Corporation, SAS Institute Inc., Other key market players The global Natural Language Processing (NLP) Market size is projected to reach USD 80.68 billion by 2026, thereby exhibiting a CAGR of 32.4% during the forecast period. This information is published by Fortune Business Insights, in a report, titled, "Natural Language Processing (NLP) Market Size, Share & Industry Analysis, By Deployment (On-Premises, Cloud, and Hybrid), By Technology (Interactive Voice Response (IVR), Optical Character Recognition (OCR), Text Analytics, Speech Analytics, Classification and Categorization, Pattern and Image Recognition, and Others), By Industry Vertical (Healthcare, Retail, High Tech and Telecom, BFSI, Automotive & Transportation, Advertising & Media, Manufacturing, and Others) and Regional Forecast, 2019-2026." The report further states that the market was USD 8.61 billion in 2018. It is set to gain momentum from the rising demand for big data, improved algorithms, and powerful computing. What Does the Report Contain?
Utopia Global Releases Intelligent Data Capture and Control Software
Utopia Global, Inc., a leading global data solutions company known for its end-to-end data quality, data migration, and data governance software solutions, has announced the launch of a new cloud based intelligent software platform: Intelligent Data Capture and Control (IDCC). IDCC is Utopia's new cloud-based master data enrichment and governance solution. It provides asset-intensive organizations with an automated, easily deployable suite to rapidly improve the quality of asset master data in their SAP or non-SAP maintenance systems of record. "Utopia is thrilled to continue our co-innovation with the SAP Asset Management team, IDCC being our newest contribution to the SAP Intelligent Asset Management solution. This release of IDCC provides access to our robust machine learning engine for creating high quality material and asset master data from multiple sources, including unstructured content," said Arvind J. Singh, Chairman and CEO of Utopia Global, Inc. IDCC uniquely leverages optical character recognition, Utopia's advanced machine learning code, intelligent online web search, and document search.
BOFFIN TTS: Few-Shot Speaker Adaptation by Bayesian Optimization
Moss, Henry B., Aggarwal, Vatsal, Prateek, Nishant, González, Javier, Barra-Chicote, Roberto
We present BOFFIN TTS (Bayesian Optimization For FIne-tuning Neural Text To Speech), a novel approach for few-shot speaker adaptation. Here, the task is to fine-tune a pre-trained TTS model to mimic a new speaker using a small corpus of target utterances. We demonstrate that there does not exist a one-size-fits-all adaptation strategy, with convincing synthesis requiring a corpus-specific configuration of the hyper-parameters that control fine-tuning. By using Bayesian optimization to efficiently optimize these hyper-parameter values for a target speaker, we are able to perform adaptation with an average 30% improvement in speaker similarity over standard techniques. Results indicate, across multiple corpora, that BOFFIN TTS can learn to synthesize new speakers using less than ten minutes of audio, achieving the same naturalness as produced for the speakers used to train the base model.