Our second example deals with a more challenging problem: the recognition of hand-printed letters of the alphabet. The characters that people print in the ordinary course of filling out forms and questionnaires are surprisingly varied. Gaps abound wherecontinuous lines might be expected; curves and sharp angles appear interchangeably; there is almost every imaginable distortion of slant, shape and size. Even human readers cannot always identify such characters; their error rate is about 3 per cent on randomly selected letters and numbers, seen out of context.
– from Oliver G. Selfridge & Ulric Neisser. PATTERN RECOGNITION BY MACHINE . In Computers & thought, Edward A. Feigenbaum and Julian Feldman (Eds.). MIT Press, Cambridge, MA, USA, 1963. pp. 8-30.
Google Lens' powers of recognition have improved and is now able to recognize over a billion products, four times more than it could a year ago. Google announced the milestone for its AI-powered image recognition app in a blogpost, detailing how it taught Lens to read by combining a custom optical character recognition (OCR) engine with its language learnings from search and its Knowledge Graph. Thanks to the OCR engine and training, Lens has become very good at reading product labels and other text, which helps it to identify over a billion products. To distinguish between the letter "o" and a zero, for example, it leans on spelling correction models from Google Search. Aparna Chennapragada, vice president of Google Lens and AR, says the world is entering a new phase of computing that she calls the "era of the camera", helping users search what they see.
The Victorian Police Force has announced it will be implementing new number plate recognition and in car video technology, thanks to a AU$17.3 million deal with Motorola Solutions. Under the five-year deal, 220 police vehicles will be fitted with high-resolution, cloud-based Automatic Number Plate Recognition (ANPR) technology that enables the rapid scanning through thousands of vehicle number plates to identify dangerous and unauthorised drivers in real-time. By March 2021, Highway Patrol vehicles will also be kitted out with new in-car video technology that will allow officers to record audio and video footage of road policing activities, including roadside intercepts. According to the state government, the number plate technology is part of a AU$43.8 million investment to boost Victoria Police's capacity to target dangerous drivers and unregistered vehicles. "By combining ANPR detection with in-car video, this solution will provide high quality visual and audio corroboration of incidents and offences witnessed by police," Motorola Solutions VP and MD Steve Crutchfield said.
Microsoft has reached a milestone in text-to-speech synthesis with a production system that uses deep neural networks to make the voices of computers nearly indistinguishable from recordings of people. With the human-like natural prosody and clear articulation of words, Neural TTS has significantly reduced listening fatigue when you interact with AI systems. Our team demonstrated our neural-network powered text-to-speech capability at the Microsoft Ignite conference in Orlando, Florida, this week. The capability is currently available in preview through Azure Cognitive Services Speech Services. Neural text-to-speech can be used to make interactions with chatbots and virtual assistants more natural and engaging, convert digital texts such as e-books into audiobooks and enhance in-car navigation systems.
Researchers at Facebook offered up a summary of a system they call "Rosetta," a machine learning approach that boosts traditional optical character recognition, or "OCR," to mine the hundreds of millions of photos uploaded to Facebook daily. Say you want to search for memes in images on Facebook: The site's challenge is to detect whether there are letters printed within an image, and then parse those letters to know what a phrase says. This technology has, of course, been in use for document processing for ages, but the challenge at Facebook was both to recognize text in any number of complex images, including text laid over the image, as in an internet meme or text such as a sign that was part of the original image, and then to make it work at the scale of the site's constant stream of images. Facebook researchers Fedor Borisyuk, Albert Gordo, and Viswanath Sivakumar shared the work on Rosetta at the Knowledge Discovery and Data Mining conference in London in late August, in a formal paper, and today, two of the authors, Gordo and Sivakumar, along with Facebook's Manohar Paluri, offered up a somewhat simpler blog post describing the work. Facebook split up the task of "extracting" text from an image into two separate matters, that of first detecting whether there is text at all in an image, and then to parsing what that word of phrase might be.
People online tend to communicate not just with words, but also with images. For a platform like Facebook with over 2 billion monthly active users, that means a plethora of images gets posted every day, including memes. In order to include images with text in relevant photo search results, to give screen readers a way to read what's written on them and to make sure they don't contain hate speech and other words that violate the website's content policy, Facebook has created and deployed a large-scale machine learning system called "Rosetta." Facebook needed an optical character recognition system that can regularly process huge volumes of content, so it had to conjure up its own technology. In a new blog post, the company explained how Rosetta works: it starts by detecting rectangular regions in images that potentially contain text.
Effect.AI published a new video blog informing the community about some of the latest developments regarding the project. Effect.AI's vision for its'Mechanical Turk', microtask platform is simply inspiring as its leadership is clearly choosing to build Effect.AI in a socially responsible way. Let's recap what the project is about, have a listen to what Polina Boykova has to say and connect some of the dots regarding the project's ambitions. Effect.AI has a three-staged approach with regards to building, what they call, the'Effect Network'. In the first and current stage, Effect.AI is aiming at disintermediating and decentralizing crowdsourced'micro-tasking'.
Google Cloud Text-to-Speech enables developers to synthesize natural-sounding speech with 30 voices, available in multiple languages and variants. It applies DeepMind's groundbreaking research in WaveNet and Google's powerful neural networks to deliver high fidelity audio. With this easy-to-use API, you can create lifelike interactions with your users, across many applications and devices.
Google Cloud on Tuesday announced the general availability of its Cloud Text-to-Speech API, which lets developers add natural-sounding speech to their devices or applications. The API also now offers a feature to optimize the speech for specific kinds of speakers. Google has also added several new WaveNet voices to the API, opening up opportunities for natural-sounding speech in more languages and a wider variety of voices. Google first announced Text-to-Speech in March, illustrating how Google has been able to leverage technology from its acquisition of DeepMind. The AI company created WaveNet, a deep neural network for generating raw audio.
Early experiments in computer vision took place in the 1950s, using some of the first neural networks to detect the edges of an object and to sort simple objects into categories like circles and squares. In the 1970s, the first commercial use of computer vision interpreted typed or handwritten text using optical character recognition. This advancement was used to interpret written text for the blind. As the internet matured the 1990s, making large sets of images available online for analysis, facial recognition programs flourished. These growing data sets helped make it possible for machines to identify specific people in photos and videos.
Twilio is giving developers more control over their interactive voice applications with built-in support for Amazon Polly -- the AWS text-to-speech service that uses deep learning to synthesize speech. The integration adds more than 50 human-sounding voices in 25 languages to the Twilio platform, the cloud communications company announced Monday. In addition to offering access to different voices and languages, Polly will enable developers using Twilio's Programmable Voice to control variables like the volume, pitch, rate and pronunciation of the voices that interact with end users. Programmable Voice has long offered a built-in basic text-to-speech (TTS) service that supports three voices, each with their own supported set of languages. TTS capabilities, however, have improved dramatically in recent years, and Twilio notes that Amazon has been at the forefront of these improvements.