"Automatic speech recognition (ASR) is one of the fastest growing and commercially most promising applications of natural language technology. Speech is the most natural communicative medium for humans in many situations, including applications such as giving dictation; querying database or information-retrieval systems; or generally giving commands to a computer or other device, especially in environments where keyboard input is awkward or impossible (for example, because one's hands are required for other tasks)."
– from Linguistic Knowledge and Empirical Methods in Speech Recognition. By Andreas Stolcke. (1997). AI Magazine 18 (4): 25-32.
David Borish is Chief Creative at PRIMO AI, a New York startup that recommends the highest performing Speech-to-text (STT) and Natural Language Understanding (NLU) services for a particular dataset and geographical region. We discover what the biggest problem with speech to text systems is today, and why trying to solve it by hiring data scientists can be prohibitively expensive. We also discuss the advantages of acquiring a technology patent, why David chose to recently enter the voice space, and the approach he takes when selecting his next entrepreneurial challenge. David is a seasoned startup veteran who believes passionately in the future of voice, and our conversation contains many valuable lessons to take away.
An automatic-speech-recognition system -- such as Alexa's -- converts speech into text, and one of its key components is its language model. Given a sequence of words, the language model computes the probability that any given word is the next one. For instance, a language model would predict that a sentence that begins "Toni Morrison won the Nobel" is more likely to conclude "Prize" than "dries". Language models can thus help decide between competing interpretations of the same acoustic information. Conventional language models are n-gram based, meaning that they model the probability of the next word given the past n-1 words.
Since Siri debuted on the iPhone 4s back in 2011, voice assistants have gone from unworkable gimmick to the basis for smart speaker technology found in one in six American homes. "Before Siri, when I talked about [what I do] there were blank stares," Tom Hebner, head of innovation at Nuance Communications, which develops cutting edge A.I. voice technology, told Digital Trends. "People would say, 'Do you build those horrible phone systems? That was one group of people's only interaction with voice technology." According to eMarketer forecasts, almost 100 million smartphone users will be using voice assistants by 2020.
That's what companies like Salesforce are expecting as they invest in technology like Einstein Voice Assistant to help make it even easier for sales staff to track, message, update, and notify their teams about relevant customer-oriented data. And you can be sure that the likes of Microsoft Dynamics, SAP and other other CRM leaders will follow closely with this capability in the coming year as voice technology picks up speed. But what do marketers and sales leaders need to know about this advancement? How will their work be impacted by voice technology and CRM? The short answer: voice is about to shape marketing and customer experience in big ways.
McDonald's announced it will McBuy the Bay Area voice-recognition startup Apprente for an undisclosed amount. According to McDonald's, Apprente's "sound-to-meaning" technology handles "complex, multilingual, multi-accent and multi-item conversational ordering," and believes the technology will help streamline the drive-thru process -- even faster food, you say?? As the earth turns and the centuries change, so does the way people wish to order a Big Mac, and Micky D's has the cash to listen. Back in March, the company bought Dynamic Yield, which customizes drive-thru menus based on factors like weather, time of day, and customer order profiles. A month later, it invested in New Zealand app-designer Plexure, which will help connect customers to its new smart drive-thrus, among other things.
AI-powered synthetic brains will allow humans to operate 500 versions of themselves at once, according to the man behind Amazon's voice assistant. Igor Jablokov believes artificial intelligence will become so advanced we will be unable to distinguish between a real or "synthetic" mind. The CEO of Pryon previously founded Yap, a fully-automated cloud platform for voice recognition, which was snapped up by Amazon before being used for the popular Alexa. The device uses a non-human voice to communicate with users, but Igor warns such technology could change with terrifying consequences. He told the Financial Times: "People will not be able to tell if they are interacting with you or your AI proxy. "Right now, you could be doing two interviews at once.
In the shadow of a blockbuster Apple press conference earlier this afternoon, Amazon quietly announced the general availability of the Alexa Auto SDK 2.0, the latest version of the software development kit that enables automotive OEMs to integrate Alexa into their vehicles. This release ships with a suite of tools for enabling Alexa to play music, perform navigation, and control basic car functions, and for allowing access to the assistant even when internet connectivity is limited or nonexistent. To this end, the SDK includes ready-to-run sample apps for most automotive platforms and design guidelines, including C and Java libraries that facilitate the processing of audio inputs and triggers and help to establish a connection with the Alexa service. It also includes documentation for Android, Linux, Automotive Grade Linux (AGL), and QNX operating systems on ARM and x86 processor architectures. The Alexa Auto SDK 2.0 supports core Alexa functionality, such as speech recognition and text-to-speech, as well as other capabilities such as streaming media, notifications, weather reports, and over 90,000 first- and third-party voice apps.
When McDonald's spent over $300 million on big-data-crunching startup Dynamic Yield earlier this year, the move came as something of a surprise. Today the Golden Arches announced the acquisition of Apprente, a voice AI system focused on fast-food ordering. It's a niche, but it just paid off. Specific terms of the deal have not been disclosed. But the synergies are at least more immediately understandable.
Rachel Ashby is the Senior Principal Product Marketing Manager for Nuance Core Technologies, automatic speech recognition, text-to-speech and transcription engine, and Nuance APIs, Tooling and Analytics. Before joining Nuance, Ashby worked in various worldwide marketing and sales positions at IBM, including driving IBM Cloud marketing strategy, development and execution for global multi-million-dollar campaigns. As an Associate Partner in IBM Global Services, she worked closely with some of IBM's largest Fortune 100 clients to plan and deliver successful software deployments. Ashby has over 20 years of experience in the high-tech industry. Eduardo is the Director of User Experience within Nuance's Technology Advancement Group (TAG).
Increasingly, voice assistants from vendors such as Amazon, Apple, Google, Microsoft, and others are starting to find their way into myriad of devices, products, and tools used on a daily basis. While once we might have only interacted with conversational systems on our phones, dedicated desktop appliances, or desktop computers, we can now find conversational interfaces on a wide range of appliances and products from televisions to cars and even toaster ovens. Soon, any device we can interact with will have an audio conversational interface instead of buttons or screens to type or click. The dawn of the conversational computing age is here. However, are these devices intelligent enough to handle the wide range of queries that humans are posing?