For technology users who have marveled at the ability of Siri or Alexa to recognize their voice, consider this: The National Security Agency has apparently been way ahead of Apple or Amazon. The agency has at its disposal voice recognition technology that it employs to identify terrorists, government spies, or anyone they choose -- with just a phone call, according to a report by The Intercept. The disclosure was revealed in a recently published article, part of a trove of documents leaked by former NSA contractor Edward Snowden. The publication wrote that by using recorded audio, the NSA is able to create a "voiceprint," or a map of qualities that mark a voice as singular, and identify the person speaking. The documents also suggest the agency is continuously improving its speech recognition capabilities, the publication noted.
This article first appeared in Data Sheet, Fortune's daily newsletter on the top tech news. I was in Atlanta Thursday, and for the second time in two weeks I was reminded that Silicon Valley has no monopoly on innovation. I visited a company adjacent to Georgia Tech University called Pindrop, which makes voice authentication and security products used by financial services companies and the like to cut down on fraud. Its AI-driven software listens to customer responses and cuts down on annoying verification questions as well as fraudulent behavior. Pindrop has some mind-blowing capabilities.
Mozilla has released its Common Voice collection, which contains almost 400,000 recordings from 20,000 people, and is claimed to be the second-largest voice dataset publicly available. The voice samples in the collection were obtained from Mozilla's Common Voice project, which allowed users via an iOS app or website to donate their utterances. It is hoped that creating a large public dataset will allow for better voice-enabled applications. "One reason so few services are commercially available is a lack of data," Mozilla senior vice president of emerging technologies Sean White said in a blog post. "Startups, researchers, or anyone else who wants to build voice-enabled technologies need high-quality, transcribed voice data on which to train machine-learning algorithms.
We are on the cusp of a technological revolution whereby increasingly sophisticated tasks can be handed over from humans to machines. Organizations are embracing advancements in artificial intelligence, robotics, and natural language technology to adopt platforms that can "learn" from experience and actually interact with users. The next wave of these chatbots will have enhanced real-time data analytics and automation capabilities and the ability to integrate intelligence across multiple digital channels to engage customers in natural conversations using voice or text. When you have a question about a product or service, you will be presented with the best agent, who possesses the entire company's collective experience and a huge wealth of knowledge to address your issue. Think about what happens today when you call your bank or the help desk of an ecommerce site.
Voice recognition has come on in leaps and bounds over recent years and a new AI can now pick out an individual's voice from a crowd. The system uses machine learning to identify the voice prints of a group of speakers, then reconstructs what each person has said. This overcomes what researchers have dubbed the'cocktail party effect' and could lead to improved smart assistants and better automated transcription of speech. Voice recognition has come on in leaps and bounds over recent years and a new AI can now pick out an individual's voice from a crowd. In tests the system could differentiate the voices of up to five people at the same time.
Four months ago, we reported that Google will be bringing voice query data to Search Analytics within the Google Search Console reporting engine. But when Gary Illyes was asked about it just a week or so ago, he had no updates to share. Simon Heseltine quoted Gary Illyes from Google saying on stage "I don't know if anyone is working on filtering out voice data at this time." Jenny Halasz quoted him saying "We're getting voice query data in search console, but we're just not able to filter. So we have no ETA on when this might be coming.
Intel and Amazon are partnering to combine the former's silicon and smarts with the latter's Alexa voice platform. The chipmaker has introduced the Intel Speech Enabling Developer Kit to provide a "complete audio front-end solution for far-field voice control," according to a press release. The idea is that Intel has done the hard work of designing the mic arrays and voice systems and that all developers will need to do is write applications for them. It offers algorithms for echo cancellation and beam forming, wake words, an 8-mic array and the company's dual digital signal processor. The development kit is up for pre-order starting today for $399.
Squeezing down, say, the AI that powers Amazon's AI assistant, Alexa, to run on simple battery-powered chips with clock speeds of just hundreds of megahertz isn't feasible. That's partly because Alexa has to interpret a lot of different sounds, but also because most voice recognition AIs use neural networks that are resource-hungry, which is why Alexa sends its processing to the cloud. The team's first attempts required eight million calculations to analyze a one-second clip of audio with 89 percent accuracy. Instead, he suggests that slightly high-power chips that can summon more of the linguistic capabilities of the kind found in Google Assistant and Amazon's Alexa may be better suited to consumer applications.