AITopics | Optical Character Recognition

Collaborating Authors

Optical Character Recognition

Our second example deals with a more challenging problem: the recognition of hand-printed letters of the alphabet. The characters that people print in the ordinary course of filling out forms and questionnaires are surprisingly varied. Gaps abound wherecontinuous lines might be expected; curves and sharp angles appear interchangeably; there is almost every imaginable distortion of slant, shape and size. Even human readers cannot always identify such characters; their error rate is about 3 per cent on randomly selected letters and numbers, seen out of context.
– from Oliver G. Selfridge & Ulric Neisser. PATTERN RECOGNITION BY MACHINE . In Computers & thought, Edward A. Feigenbaum and Julian Feldman (Eds.). MIT Press, Cambridge, MA, USA, 1963. pp. 8-30.

News Overviews Instructional Materials AI-Alerts Classics

Calibrated Structured Prediction

Kuleshov, Volodymyr, Liang, Percy S.

Neural Information Processing SystemsFeb-14-2020, 14:56:19 GMT

In user-facing applications, displaying calibrated confidence measures---probabilities that correspond to true frequency---can be as important as obtaining high accuracy. We are interested in calibration for structured prediction problems such as speech recognition, optical character recognition, and medical diagnosis. Structured prediction presents new challenges for calibration: the output space is large, and users may issue many types of probability queries (e.g., marginals) on the structured output. We extend the notion of calibration so as to handle various subtleties pertaining to the structured setting, and then provide a simple recalibration method that trains a binary classifier to predict probabilities of interest. We explore a range of features appropriate for structured recalibration, and demonstrate their efficacy on three real-world datasets.

calibrated structured prediction, probability, recognition, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.94)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.67)

Add feedback

Deep Voice 2: Multi-Speaker Neural Text-to-Speech

Gibiansky, Andrew, Arik, Sercan, Diamos, Gregory, Miller, John, Peng, Kainan, Ping, Wei, Raiman, Jonathan, Zhou, Yanqi

Neural Information Processing SystemsFeb-14-2020, 11:44:31 GMT

We introduce a technique for augmenting neural text-to-speech (TTS) with low-dimensional trainable speaker embeddings to generate different voices from a single model. As a starting point, we show improvements over the two state-of-the-art approaches for single-speaker neural TTS: Deep Voice 1 and Tacotron. We introduce Deep Voice 2, which is based on a similar pipeline with Deep Voice 1, but constructed with higher performance building blocks and demonstrates a significant audio quality improvement over Deep Voice 1. We improve Tacotron by introducing a post-processing neural vocoder, and demonstrate a significant audio quality improvement. We then demonstrate our technique for multi-speaker speech synthesis for both Deep Voice 2 and Tacotron on two multi-speaker TTS datasets.

deep voice 1, deep voice 2, multi-speaker neural text-to-speech, (2 more...)

Neural Information Processing Systems

Genre: Research Report (0.44)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.65)
Information Technology > Artificial Intelligence > Assistive Technologies (0.65)

Add feedback

Natural Language Processing (NLP) Market to Reach USD 80.68 billion by 2026; Increasing Demand for Enhanced Algorithms to Boost Growth, says Fortune Business Insights

#artificialintelligenceFeb-5-2020, 18:02:59 GMT

Key Companies Covered in NLP Market Research Report are 3M Company, Adobe Systems Inc., Amazon Web Services Inc., Apple Inc., Google (Alphabet Inc.), Hewlett-Packard Enterprise Company, Intel Corporation, Microsoft Corporation, SAS Institute Inc., Other key market players The global Natural Language Processing (NLP) Market size is projected to reach USD 80.68 billion by 2026, thereby exhibiting a CAGR of 32.4% during the forecast period. This information is published by Fortune Business Insights, in a report, titled, "Natural Language Processing (NLP) Market Size, Share & Industry Analysis, By Deployment (On-Premises, Cloud, and Hybrid), By Technology (Interactive Voice Response (IVR), Optical Character Recognition (OCR), Text Analytics, Speech Analytics, Classification and Categorization, Pattern and Image Recognition, and Others), By Industry Vertical (Healthcare, Retail, High Tech and Telecom, BFSI, Automotive & Transportation, Advertising & Media, Manufacturing, and Others) and Regional Forecast, 2019-2026." The report further states that the market was USD 8.61 billion in 2018. It is set to gain momentum from the rising demand for big data, improved algorithms, and powerful computing. What Does the Report Contain?

industry analysis, market size, regional forecast, (14 more...)

#artificialintelligence

Country:

Asia > China (0.05)
South America (0.05)
North America > United States (0.05)
(6 more...)

Genre: Research Report (0.69)

Industry:

Information Technology > Services (0.69)
Information Technology > Security & Privacy (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.55)
Information Technology > Data Science > Data Mining > Big Data (0.50)

Add feedback

Utopia Global Releases Intelligent Data Capture and Control Software

#artificialintelligenceFeb-4-2020, 12:28:37 GMT

Utopia Global, Inc., a leading global data solutions company known for its end-to-end data quality, data migration, and data governance software solutions, has announced the launch of a new cloud based intelligent software platform: Intelligent Data Capture and Control (IDCC). IDCC is Utopia's new cloud-based master data enrichment and governance solution. It provides asset-intensive organizations with an automated, easily deployable suite to rapidly improve the quality of asset master data in their SAP or non-SAP maintenance systems of record. "Utopia is thrilled to continue our co-innovation with the SAP Asset Management team, IDCC being our newest contribution to the SAP Intelligent Asset Management solution. This release of IDCC provides access to our robust machine learning engine for creating high quality material and asset master data from multiple sources, including unstructured content," said Arvind J. Singh, Chairman and CEO of Utopia Global, Inc. IDCC uniquely leverages optical character recognition, Utopia's advanced machine learning code, intelligent online web search, and document search.

data capture and control software, global release intelligent data capture, intelligent data capture, (5 more...)

#artificialintelligence

Industry:

Information Technology > Services (0.90)
Banking & Finance (0.69)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.60)

Add feedback

BOFFIN TTS: Few-Shot Speaker Adaptation by Bayesian Optimization

Moss, Henry B., Aggarwal, Vatsal, Prateek, Nishant, González, Javier, Barra-Chicote, Roberto

arXiv.org Machine LearningFeb-4-2020

We present BOFFIN TTS (Bayesian Optimization For FIne-tuning Neural Text To Speech), a novel approach for few-shot speaker adaptation. Here, the task is to fine-tune a pre-trained TTS model to mimic a new speaker using a small corpus of target utterances. We demonstrate that there does not exist a one-size-fits-all adaptation strategy, with convincing synthesis requiring a corpus-specific configuration of the hyper-parameters that control fine-tuning. By using Bayesian optimization to efficiently optimize these hyper-parameter values for a target speaker, we are able to perform adaptation with an average 30% improvement in speaker similarity over standard techniques. Results indicate, across multiple corpora, that BOFFIN TTS can learn to synthesize new speakers using less than ten minutes of audio, achieving the same naturalness as produced for the speakers used to train the base model.

adaptation, boffin tts, target speaker, (15 more...)

arXiv.org Machine Learning

2002.01953

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.36)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.35)

Add feedback

Univar Solutions Emea Leverages OpenText Enhancements to Operations

#artificialintelligenceJan-17-2020, 16:46:27 GMT

OpenText, a global leader in Enterprise Information Management (EIM), announced Univar Solutions EMEA, a leading distributor of chemical ingredients and services in Europe, is working with OpenText Professional Services to upgrade their deployment of OpenText Vendor Invoice Management for SAP Solutions to further transform its accounts payable operations with new AI, intelligent capture and automation capabilities. OpenText Vendor Invoice Management for SAP routes invoices automatically to the right person for resolution, approval and payment. New enhancements to the solution will boost Univar Solutions EMEA's operations by giving the company access to OCR line item recognition, improving invoice training and automating previous manual freight processing and costing. "Deep integration between OpenText and SAP is helping us continuously streamline our accounts payable processes, while continuing to find productivity gains through automation and innovation," said Brian Morgan, IT director EMEA, Univar. "We are working with OpenText Professional Services to take advantage of new capabilities in AI and process automation, ensuring that our people are focused on the customer-facing work which matters most to our business. Powerful optical character recognition combined with machine learning and intelligent automation enables content to be matched against supplier delivery notes. This helps Univar Solutions EMEA continuously identify and remove bottlenecks and automatically correct errors or inefficiencies before they impact customer satisfaction. Advanced analytics and reporting tools give Univar Solutions EMEA greater visibility over its accounts payable processes, helping ensure governance, compliance and clarity. "OpenText helps companies connect business applications, digital business processes and proprietary company content.

opentext vendor invoice management, solution emea leverage opentext enhancement, univar solution emea, (7 more...)

#artificialintelligence

Country:

Europe (0.27)
Asia > China (0.07)

Industry: Professional Services (0.49)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.58)
Information Technology > Artificial Intelligence > Machine Learning (0.58)
Information Technology > e-Commerce > Financial Technology (0.38)

Add feedback

Amazon researchers use AI to improve the recognition of curved text

#artificialintelligenceDec-29-2019, 14:06:17 GMT

Optical character recognition (OCR), or the conversion of images of handwritten or printed text into machine-readable text, is a science that dates back to the early '70s. But algorithms have long struggled to make out characters that aren't parallel with horizontal planes, which is why researchers at Amazon developed what they call TextTubes. They're detectors for curved text in natural images that model said text as tubes around their medial (middle) axes. In a paper describing their work, the coauthors claim that their approach achieves state-of-the-art results on a popular OCR benchmark. As the researchers explain, scene text is typically broken down into two successive tasks: text detection and text recognition.

amazon researcher use ai, curved text, recognition, (7 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.58)

Add feedback

Looking For An Alternative OCR Technology?

#artificialintelligenceDec-28-2019, 17:16:20 GMT

Does your OCR technology make sense of the data that is extracted? Traditional OCR technology provides less accuracy as it does not understand what is being extracted and hence a considerable amount of errors occur. To remove such errors it needs manual fixing which is time-consuming and will require significant resources. The AI-powered Infrrd OCR removes all such difficulties by implementing machine learning algorithms to understand the data that has been extracted and improves the output automatically. When it comes to choosing an OCR app, accuracy is one of the most important criteria.

alternative ocr technology, information, receipt, (9 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.33)

Add feedback

Evaluating Usage of Images for App Classification

Singla, Kushal, Mukherjee, Niloy, Koduvely, Hari Manassery, Bose, Joy

arXiv.org Machine LearningDec-16-2019

App classification is useful in a number of applications such as adding apps to an app store or building a user model based on the installed apps. Presently there are a number of existing methods to classify apps based on a given taxonomy on the basis of their text metadata. However, text based methods for app classification may not work in all cases, such as when the text descriptions are in a different language, or missing, or inadequate to classify the app. One solution in such cases is to utilize the app images to supplement the text description. In this paper, we evaluate a number of approaches in which app images can be used to classify the apps. In one approach, we use Optical character recognition (OCR) to extract text from images, which is then used to supplement the text description of the app. In another, we use pic2vec to convert the app images into vectors, then train an SVM to classify the vectors to the correct app label. In another, we use the captionbot.ai tool to generate natural language descriptions from the app images. Finally, we use a method to detect and label objects in the app images and use a voting technique to determine the category of the app based on all the images. We compare the performance of our image-based techniques to classify a number of apps in our dataset. We use a text based SVM app classifier as our base and obtained an improved classification accuracy of 96% for some classes when app images are added.

app, app image, classification, (14 more...)

arXiv.org Machine Learning

1912.12144

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland > Geneva > Geneva (0.04)
Asia > India > Karnataka > Bengaluru (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.35)

Add feedback

Latest Version of the Appian Low-code Platform Now Available Appian

#artificialintelligenceDec-10-2019, 20:05:38 GMT

TYSONS, VA – Appian (NASDAQ: APPN) today announced the latest version of the Appian Platform. The new release of the low-code application development platform increases the speed and business impact of low-code automation for developers, administrators, and end-users. The latest version delivers enhancements to Appian AI, further-expansion of Appian's Connected Systems architecture, integrated Health Check in every application, and simplified DevOps, making it easier than ever to develop, deploy, change, and manage Appian applications. Appian AI, a fast way to add best-of-breed artificial intelligence to any Appian application, now offers Google Cloud Translation as a Connected System. Customers can enable any app to detect languages and translate text with no coding. In addition, this release provides an updated Google Cloud Vision Connected System which now offers integration with Optical Character Recognition (OCR).

appian, application, forward-looking statement, (15 more...)

#artificialintelligence

Genre: Press Release (0.49)

Industry:

Information Technology > Security & Privacy (0.51)
Banking & Finance > Trading (0.37)

Technology: Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.56)

Add feedback