Optical Character Recognition


Google Creates A Text To Speech AI system Alike Human voice

#artificialintelligence

Google has plunged high towards its'AI-first' dream. The tech giant has attempted to develop a Text-to-speech system that has exactly human-like articulation. This AI system is called "Tacotron 2" that has the ability to give an AI-generated computer speech in a human-voice. Google researchers mentioned in the blog post that the new procedure does not utitilise complex linguistic and acoustic features as input. In place of it, they developed human-like speech from text using neural networks trained using only speech examples and corresponding text transcript.


Google's New Text-to-Speech AI Is so Good We Bet You Can't Tell It From a Real Human

#artificialintelligence

Can you tell the difference between AI-generated computer speech and a real, live human being? Maybe you've always thought you could. Maybe you're fond of Alexa and Siri but believe you would never confuse either of them with an actual woman. Things are about to get a lot more interesting. Google engineers have been hard at work creating a text-to-speech system called Tacotron 2. According to a paper they published this month, the system first creates a spectrogram of the text, a visual representation of how the speech should sound.


Google develops human-like text-to-speech artificial intelligence system

#artificialintelligence

In a major step towards its "AI first" dream, Google has developed a text-to-speech artificial intelligence (AI) system that will confuse you with its human-like articulation. The tech giant's text-to-speech system called "Tacotron 2" delivers an AI-generated computer speech that almost matches with the voice of humans, technology news website Inc.com reported. At Google I/O 2017 developers conference, company's Indian-origin CEO Sundar Pichai announced that the internet giant was shifting its focus from mobile-first to "AI first" and launched several products and features, including Google Lens, Smart Reply for Gmail and Google Assistant for iPhone. According to a paper published in arXiv.org, the system first creates a spectrogram of the text, a visual representation of how the speech should sound. That image is put through Google's existing WaveNet algorithm, which uses the image and brings AI closer than ever to indiscernibly mimicking human speech.


Flipboard on Flipboard

#artificialintelligence

Can you tell the difference between AI-generated computer speech and a real, live human being? Maybe you've always thought you could. Maybe you're fond of Alexa and Siri but believe you would never confuse either of them with an actual woman. Things are about to get a lot more interesting. Google engineers have been hard at work creating a text-to-speech system called Tacotron 2. According to a paper they published this month, the system first creates a spectrogram of the text, a visual representation of how the speech should sound.


Hacked Dog Pics Can Play Tricks on Computer Vision AI

#artificialintelligence

Researchers at the Massachusetts Institute of Technology (MIT) have demonstrated a new way to fool computer vision algorithms that enable artificial intelligence systems to see. The researchers exploited the Google Cloud Vision API that enables anyone to perform image labeling, face and landmark detection, optical character recognition, and tagging of explicit content. Traditional hacking approaches are inefficient and impractical when targeting large images with tens of thousands of pixels. To overcome this problem, the MIT team adapted a "natural evolution strategies" method that generates smaller populations of images around the larger image, with large random groups of pixels being perturbed instead of single pixels. Then, given the classifier's output on these randomly perturbed images, the system recovers what the contribution of each individual pixel is to the classification output, according to MIT researcher Andrew Ilyas.


Google's new text-to-speech system sounds convincingly human

#artificialintelligence

Get ready for the little person living inside your phone and speaker to sound a lot more life-like. Google believes it has reached a new milestone in the quest to make computer-generated speech indistinguishable from human speech with Tacotron 2, a system that trains neural networks to generate eerily natural-sounding speech from text, and they have the samples to prove it. In a research paper published earlier this month, though yet to be peer-reviewed, Google asserts that previous approaches to text-to-speech (TTS) systems have thus far failed to achieve a genuinely natural sound. Techniques such as concatenative synthesis, in which pre-recorded samples of speech are stitched together, and statistical parametric speech synthesis, Google says have been insufficient, explaining, "The audio produced by these systems often sounds muffled and unnatural compared to human speech." With Tacotron 2 (which is not the same as the world-ending super-weapon used by Lord Business), the company says it has incorporated ideas from its previous TTS systems, WaveNet and the first Tacotron, to reach a new level of fidelity.


Text detection API showdown: Google vision vs Microsoft Vs Amazon

#artificialintelligence

Detecting and reading text from photos has multiple use cases, be it clicking a picture of a printed text and automatically converting it into a digital file or the new age application of reading bills and invoices. Other interesting use cases include deep image search, understanding local business listing using street view images or when combined with text translation the ability to take a picture of a billboard in a foreign country and have it converted to your native language, the possibilities are limitless. Image text recognition is a class of computer vision problems which, among other things, includes OCR (optical character recognition) or text detection (used to find printed text on images) or handwritten text recognition. With the advancement of deep learning we have come a long way to get substantially better at text recognition, but still, the best companies in the business have much to cover before we can consider this problem as solved. Most of the major technology companies/cloud services provide APIs to recognize text in an image.


Telstra Ventures backs robotic digitisation company Ripcord

ZDNet

Telstra's venture capital arm Telstra Ventures has announced backing a robotic digitisation company, which it said is the world's first to specialise in document management. Ripcord, based in California, provides services for records management -- a market worth around $62 billion, Telstra Ventures MD Mark Sherman said -- including scanning, indexing, automatic classification of documents, shipping, and unlimited access to the Ripcord Canopy cloud platform. Ripcord's robotic digitisation capability scans paper records and utilises optical character recognition (OCR), uploading them to Canopy, which can then be integrated into enterprise systems. This enables a 90 percent cost saving on document management for businesses, Sherman said. "Ripcord is working towards a paperless world, and is using sophisticated automation and software to turn their paper records into a secure, fast, and all-inclusive records management solution," Sherman added.


AI in CRM for Wealth Management: Sizzle or Steak?

#artificialintelligence

I'm a consistent conference goer. I go to learn about and discuss all things wealthTech, and every year we see the bandwagon steer towards the same trends. This year especially – though certainly true of the last few years – we seem to have latched on to artificial intelligence (AI) and machine learning (ML). I see demos of some really cool technology. But, it's usually just that; technology.


Artificial Intelligence – A human revolution

#artificialintelligence

There is not one but many definitions of Artificial Intelligence (AI), and its scope remains fluid and evolving. Some people even state that AI is everything that has not yet been done, referring to the observation that as the tools we use daily become increasingly sophisticated, tasks previously considered as requiring'intelligence' are now considered routine and get excluded from the AI definition. Think for example of a good spam filter, spell check or optical character recognition, all of which used to be considered revolutionary, but today don't impress people anymore. 'A constellation of technologies that extend human capabilities by sensing, comprehending, acting and learning – allowing people to do much more.' In other words, we put the focus on the ability of AI to complement and empower people instead of replace them.