Goto

Collaborating Authors

 Optical Character Recognition


rOpenSci Support for hOCR and Tesseract 4 in R

@machinelearnbot

Earlier this month we released a new version of the tesseract package to CRAN. This package provides R bindings to Google's open source optical character recognition (OCR) engine Tesseract. Two major new features are support for HOCR and support for the upcoming Tesseract 4. Support for HOCR output was requested by one of our users on Github. Every word in the hOCR output includes meta data such as bounding box, confidence metrics, etc. So this gives us a little more information about the OCR results than just the text.


Enhancing RNN Based OCR by Transductive Transfer Learning From Text to Images

AAAI Conferences

This paper presents a novel approach for optical character recognition (OCR) on acceleration and to avoid underfitting by text. Previously proposed OCR models typically take much time in the training phase and require large amount of labelled data to avoid underfitting. In contrast, our method does not require such condition. This is a challenging task related to transferring the character sequential relationship from text to OCR. We build a model based on transductive transfer learning to achieve domain adaptation from text to image. We thoroughly evaluate our approach on different datasets, including a general one and a relatively small one. We also compare the performance of our model with the general OCR model on different circumstances. We show that (1) our approach accelerates the training phase 20-30% on time cost; and (2) our approach can avoid underfitting while model is trained on a small dataset.


Who's Afraid of Autonomous Mail Trucks? RealClearPolicy

#artificialintelligence

Highly automated vehicles (HAVs) have gone from fantasy to reality in the past decade. It is remarkable to witness the various prototypes motoring about courses and taking test drives on America's roads. Already, related technology helps people parallel park their cars -- some don't even require the driver be in the car. Among other benefits, HAV technology has the potential to save lives and reduce insurance costs by greatly decreasing human errors, which cause 94 percent of accidents. Computers, note two keen observers, don't get drunk or drowsy.


Step inside the MIT lab designing new human-computer interfaces

#artificialintelligence

"A collection of smart devices may not make you smarter. There seems to be a gap between what technology has to offer and what we are naturally able to do" Suranga Nanayakkara slips a black ring onto his finger and points. This ring, he explains, helps visually impaired people read by converting text into speech. Nanayakkara points at a poster on the wall more than a metre away, clicks a small button on the side of the ring, and almost instantaneously a female voice starts reading out the poster's header through the headphones he's wearing. Such optical character recognition technology, or OCR, already exists but is often locked inside clunky highlighter-style devices that are slow and cumbersome.


BrandPost: Ready to tackle the next phase of the digital transformation?

PCWorld

The convergence of mobility and cloud have led to a digital explosion. Now that users have anytime, anywhere, any device access, they are generating mountains of data. In fact, IDC predicts that by 2025, the world will create 160 trillion gigabytes of it. But more important than the volume is what companies do with that data โ€“ how they leverage it for heightened customer experiences, for improved day-to-day decision making, and to innovate. As enterprises come to realize that to compete in this digital economy, they also understand they must effectively leverage and manage data.


Google Creates A Text To Speech AI system Alike Human voice

#artificialintelligence

Google has plunged high towards its'AI-first' dream. The tech giant has attempted to develop a Text-to-speech system that has exactly human-like articulation. This AI system is called "Tacotron 2" that has the ability to give an AI-generated computer speech in a human-voice. Google researchers mentioned in the blog post that the new procedure does not utitilise complex linguistic and acoustic features as input. In place of it, they developed human-like speech from text using neural networks trained using only speech examples and corresponding text transcript. Google's CEO Sundar Pichai announced that the company will be shifting its focus from mobile-first to AI-first at the Google I/O 2017 developers conference.


Google's New Text-to-Speech AI Is so Good We Bet You Can't Tell It From a Real Human

#artificialintelligence

Can you tell the difference between AI-generated computer speech and a real, live human being? Maybe you've always thought you could. Maybe you're fond of Alexa and Siri but believe you would never confuse either of them with an actual woman. Things are about to get a lot more interesting. Google engineers have been hard at work creating a text-to-speech system called Tacotron 2. According to a paper they published this month, the system first creates a spectrogram of the text, a visual representation of how the speech should sound.


Google develops human-like text-to-speech artificial intelligence system

#artificialintelligence

In a major step towards its "AI first" dream, Google has developed a text-to-speech artificial intelligence (AI) system that will confuse you with its human-like articulation. The tech giant's text-to-speech system called "Tacotron 2" delivers an AI-generated computer speech that almost matches with the voice of humans, technology news website Inc.com reported. At Google I/O 2017 developers conference, company's Indian-origin CEO Sundar Pichai announced that the internet giant was shifting its focus from mobile-first to "AI first" and launched several products and features, including Google Lens, Smart Reply for Gmail and Google Assistant for iPhone. According to a paper published in arXiv.org, the system first creates a spectrogram of the text, a visual representation of how the speech should sound. That image is put through Google's existing WaveNet algorithm, which uses the image and brings AI closer than ever to indiscernibly mimicking human speech.


Flipboard on Flipboard

#artificialintelligence

Can you tell the difference between AI-generated computer speech and a real, live human being? Maybe you've always thought you could. Maybe you're fond of Alexa and Siri but believe you would never confuse either of them with an actual woman. Things are about to get a lot more interesting. Google engineers have been hard at work creating a text-to-speech system called Tacotron 2. According to a paper they published this month, the system first creates a spectrogram of the text, a visual representation of how the speech should sound.


Hacked Dog Pics Can Play Tricks on Computer Vision AI

#artificialintelligence

Researchers at the Massachusetts Institute of Technology (MIT) have demonstrated a new way to fool computer vision algorithms that enable artificial intelligence systems to see. The researchers exploited the Google Cloud Vision API that enables anyone to perform image labeling, face and landmark detection, optical character recognition, and tagging of explicit content. Traditional hacking approaches are inefficient and impractical when targeting large images with tens of thousands of pixels. To overcome this problem, the MIT team adapted a "natural evolution strategies" method that generates smaller populations of images around the larger image, with large random groups of pixels being perturbed instead of single pixels. Then, given the classifier's output on these randomly perturbed images, the system recovers what the contribution of each individual pixel is to the classification output, according to MIT researcher Andrew Ilyas.