AITopics | Optical Character Recognition

Collaborating Authors

Optical Character Recognition

Our second example deals with a more challenging problem: the recognition of hand-printed letters of the alphabet. The characters that people print in the ordinary course of filling out forms and questionnaires are surprisingly varied. Gaps abound wherecontinuous lines might be expected; curves and sharp angles appear interchangeably; there is almost every imaginable distortion of slant, shape and size. Even human readers cannot always identify such characters; their error rate is about 3 per cent on randomly selected letters and numbers, seen out of context.
– from Oliver G. Selfridge & Ulric Neisser. PATTERN RECOGNITION BY MACHINE . In Computers & thought, Edward A. Feigenbaum and Julian Feldman (Eds.). MIT Press, Cambridge, MA, USA, 1963. pp. 8-30.

News Overviews Instructional Materials AI-Alerts Classics

Text to speech Python Tutorial

#artificialintelligenceAug-12-2019, 20:09:04 GMT

We can make the computer speak with Python. Given a text string, it will speak the written words in the English language. This process is called Text To Speech (TTS). Pytsx is a cross-platform text-to-speech wrapper. It uses the Google Text to Speech (TTS) API.

speech engine, speech python tutorial, tts, (1 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Assistive Technologies (1.00)

Add feedback

Amazon's Text-To-Speech AI Service Sounds More Natural And Realistic

#artificialintelligenceAug-12-2019, 00:19:18 GMT

Amazon enhanced Polly - the cloud-based text-to-speech service - to deliver natural and realistic speech synthesis. The service can now be leveraged to present domain-specific style such as newscast and sportscast. Though text-to-speech existed for more than two decades, it is never used in mainstream media due to the lack of natural and realistic modulation. Except for automated announcements that read out from existing datastores, the technology never replaced human voice and speech. Thanks to the advancements in AI, text-to-speech has evolved to become more natural and realistic to an extent that it may be hard to distinguish it from a human voice.

artificial intelligence, optical character recognition, speech, (12 more...)

#artificialintelligence

Country:

North America > United States > Virginia (0.06)
North America > United States > Oregon (0.06)
North America > Canada (0.06)
Europe > Ireland (0.06)

Industry: Information Technology > Services (0.43)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Assistive Technologies (1.00)

Add feedback

Japan Post could end Saturday standard mail deliveries next year after ministry moves to stop service

The Japan TimesAug-7-2019, 06:45:43 GMT

A government panel decided Tuesday to end Saturday delivery for standard mail to deal with a labor shortage at Japan Post Co. and a drop in demand due to increased use of the internet. The Internal Affairs and Communications Ministry accepted the proposal from the panel and will seek a law amendment at an extraordinary Diet session this fall. Delivery on Saturday could be terminated possibly next year and it will be available only on weekdays. The panel also proposed that delivery for standard mail the day after posting be ended. Japan Post, a unit of Japan Post Holdings Co., has been calling for a review to trim standard mail service hours to five days a week from the current six days to address the workforce shortage.

end saturday standard mail delivery, japan post, ministry move, (3 more...)

The Japan Times

Country: Asia > Japan (1.00)

Industry: Government > Post Office (1.00)

Technology: Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.40)

Add feedback

RNN-based Online Handwritten Character Recognition Using Accelerometer and Gyroscope Data

Soselia, Davit, Amashukeli, Shota, Koberidze, Irakli, Shugliashvili, Levan

arXiv.org Machine LearningJul-24-2019

This abstract explores an RNN-based approach to online handwritten recognition problem. Our method uses data from an accelerometer and a gyroscope mounted on a handheld pen-like device to train and run a character pre-diction model. We have built a dataset of timestamped gyroscope and accelerometer data gathered during the manual process of handwriting Latin characters, labeled with the character being written; in total, the dataset con-sists of 1500 gyroscope and accelerometer data sequenc-es for 8 characters of the Latin alphabet from 6 different people, and 20 characters, each 1500 samples from Georgian alphabet from 5 different people. with each sequence containing the gyroscope and accelerometer data captured during the writing of a particular character sampled once every 10ms. We train an RNN-based neural network architecture on this dataset to predict the character being written. The model is optimized with categorical cross-entropy loss and RMSprop optimizer and achieves high accuracy on test data.

artificial intelligence, handwritten character recognition, machine learning, (11 more...)

arXiv.org Machine Learning

1907.12935

Country: North America > United States (0.16)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.33)

Add feedback

Rosetta: Understanding text in images and videos with machine learning - Facebook Code

#artificialintelligenceJul-22-2019, 06:08:25 GMT

Understanding the text that appears on images is important for improving experiences, such as a more relevant photo search or the incorporation of text into screen readers that make Facebook more accessible for the visually impaired. Understanding text in images along with the context in which it appears also helps our systems proactively identify inappropriate or harmful content and keep our community safe. A significant number of the photos shared on Facebook and Instagram contain text in various forms. It might be overlaid on an image in a meme, or inlaid in a photo of a storefront, street sign, or restaurant menu. Taking into account the sheer volume of photos shared each day on Facebook and Instagram, the number of languages supported on our global platform, and the variations of the text, the problem of understanding text in images is quite different from those solved by traditional optical character recognition (OCR) systems, which recognize the characters but don't understand the context of the associated image.

artificial intelligence, machine learning, optical character recognition, (17 more...)

#artificialintelligence

Industry: Consumer Products & Services > Restaurants (0.54)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

RPA, AI help speed review of Medicare claims -- GCN

#artificialintelligenceJul-10-2019, 13:38:37 GMT

Employees and contractors at the Centers of Medicare and Medicaid Services spend countless hours every year reviewing thousands of medical records to ensure the accuracy of Medicare Advantage payments. An automated intake tool is working to change that. Using emerging technologies such as robotic process automation, optical character recognition, machine learning and artificial intelligence, KPMG's Intake Process Automation Tool ingests records as they are submitted and identifies potential problems according to set parameters, submission rules and coding guidance. Specifically, RPA orchestrates the steps of the intake process, OCR digitizes the scanned document and then AI and machine learning are applied to understand the document and extract the information necessary to validate the information. Intake PA stands to save CMS time and money, said Payam Mousavi, KPMG's lead director for intelligent automation for governments and the technical lead for the CMS project.

artificial intelligence, machine learning, optical character recognition, (14 more...)

#artificialintelligence

Country: North America > United States > Virginia (0.05)

Industry:

Health & Medicine > Health Care Providers & Services > Reimbursement (1.00)
Health & Medicine > Government Relations & Public Policy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.56)

Add feedback

Converting Text to Speech with Azure Cognitive Service's REST-Based API

#artificialintelligenceJul-10-2019, 13:32:33 GMT

Adam Bertram is a 20-year veteran of IT. Adam focuses on DevOps, system management, and automation technologies as well as various cloud platforms. He is a Microsoft Cloud and Datacenter Management MVP and efficiency nerd that enjoys teaching others a better way to leverage automation.

artificial intelligence, azure cognitive service, optical character recognition, (2 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.40)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.40)
Information Technology > Artificial Intelligence > Assistive Technologies (0.40)

Add feedback

AWS launches Textract, machine learning for text and data extraction

#artificialintelligenceMay-30-2019, 19:46:05 GMT

Need to extract content from a document quickly and automatically? Amazon today announced the general availability of Textract, a cloud-hosted and fully managed service that uses machine learning to parse data tables, forms, and whole pages for text and data. Virginia), US West (Oregon), and EU (Ireland) regions and will expand to additional regions in the coming year. Textract is more capable than your average optical character recognition system. From files stored in an Amazon S3 bucket, it's able to suss out the contents of fields and tables and the context in which this information is presented, like names and social security numbers in tax forms or totals from photographed receipts.

artificial intelligence, data mining, machine learning, (9 more...)

#artificialintelligence

Country:

North America > United States > Virginia (0.26)
North America > United States > Oregon (0.26)

Industry:

Government (0.76)
Banking & Finance (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.58)
Information Technology > Data Science > Data Mining > Text Mining (0.40)

Add feedback

FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents

Jaume, Guillaume, Ekenel, Hazim Kemal, Thiran, Jean-Philippe

arXiv.org Machine LearningMay-27-2019

In this paper, we present a new dataset for Form Understanding in Noisy Scanned Documents (FUNSD). Form Understanding (FoUn) aims at extracting and structuring the textual content of forms. The dataset comprises 200 fully annotated real scanned forms. The documents are noisy and exhibit large variabilities in their representation making FoUn a challenging task. The proposed dataset can be used for various tasks including text detection, optical character recognition (OCR), spatial layout analysis and entity labeling/linking. To the best of our knowledge this is the first publicly available dataset with comprehensive annotations addressing the FoUn task. We also present a set of baselines and introduce metrics to evaluate performance on the FUNSD dataset. The FUNSD dataset can be downloaded at https://guillaumejaume.github. io/FUNSD/.

machine learning, natural language, pattern recognition, (15 more...)

arXiv.org Machine Learning

1905.13538

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
Asia > Middle East > Republic of Türkiye (0.04)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.70)

Add feedback

r/MachineLearning - [R] Parallel Neural Text-to-Speech

#artificialintelligenceMay-22-2019, 03:02:17 GMT

Abstract: In this work, we propose a non-autoregressive seq2seq model that converts text to spectrogram. It is fully convolutional and obtains about 17.5 times speed-up over Deep Voice 3 at synthesis while maintaining comparable speech quality using a WaveNet vocoder. Interestingly, it has even fewer attention errors than the autoregressive model on the challenging test sentences. Furthermore, we build the first fully parallel neural text-to- speech system by applying the inverse autoregressive flow (IAF) as the parallel neural vocoder. Our system can synthesize speech from text through a single feed-forward pass. We also explore a novel approach to train the IAF from scratch as a generative model for raw waveform, which avoids the need for distillation from a separately trained WaveNet.

artificial intelligence, optical character recognition, social media, (4 more...)

#artificialintelligence

Industry: Media > News (0.40)

Technology:

Information Technology > Communications > Social Media (0.76)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.68)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.68)
Information Technology > Artificial Intelligence > Assistive Technologies (0.68)

Add feedback