Speech


Speech-to-Text Selection - David Borish, PRIMO AI - Voice Tech Podcast ep.038 - Voice Tech Podcast

#artificialintelligence

David Borish is Chief Creative at PRIMO AI, a New York startup that recommends the highest performing Speech-to-text (STT) and Natural Language Understanding (NLU) services for a particular dataset and geographical region. We discover what the biggest problem with speech to text systems is today, and why trying to solve it by hiring data scientists can be prohibitively expensive. We also discuss the advantages of acquiring a technology patent, why David chose to recently enter the voice space, and the approach he takes when selecting his next entrepreneurial challenge. David is a seasoned startup veteran who believes passionately in the future of voice, and our conversation contains many valuable lessons to take away.


How to Make Neural Language Models Practical for Speech Recognition : Alexa Blogs

#artificialintelligence

An automatic-speech-recognition system -- such as Alexa's -- converts speech into text, and one of its key components is its language model. Given a sequence of words, the language model computes the probability that any given word is the next one. For instance, a language model would predict that a sentence that begins "Toni Morrison won the Nobel" is more likely to conclude "Prize" than "dries". Language models can thus help decide between competing interpretations of the same acoustic information. Conventional language models are n-gram based, meaning that they model the probability of the next word given the past n-1 words.


5 ways that future A.I. assistants will take voice tech to the next level Digital Trends

#artificialintelligence

Since Siri debuted on the iPhone 4s back in 2011, voice assistants have gone from unworkable gimmick to the basis for smart speaker technology found in one in six American homes. "Before Siri, when I talked about [what I do] there were blank stares," Tom Hebner, head of innovation at Nuance Communications, which develops cutting edge A.I. voice technology, told Digital Trends. "People would say, 'Do you build those horrible phone systems? That was one group of people's only interaction with voice technology." According to eMarketer forecasts, almost 100 million smartphone users will be using voice assistants by 2020.


Voice Technology And CRM: A New Partnership?

#artificialintelligence

That's what companies like Salesforce are expecting as they invest in technology like Einstein Voice Assistant to help make it even easier for sales staff to track, message, update, and notify their teams about relevant customer-oriented data. And you can be sure that the likes of Microsoft Dynamics, SAP and other other CRM leaders will follow closely with this capability in the coming year as voice technology picks up speed. But what do marketers and sales leaders need to know about this advancement? How will their work be impacted by voice technology and CRM? The short answer: voice is about to shape marketing and customer experience in big ways.


Azure Media Services' new AI-powered innovation

#artificialintelligence

At Microsoft, our mission is to empower every person and organization on the planet to achieve more. The media industry exemplifies this mission. We live in an age where more content is being created and consumed in more ways and on more devices than ever. At IBC 2019, we're delighted to share the latest innovations we've been working on and how they can help transform your media workflows. Read on to learn more, or join our product teams and partners at Hall 1 Booth C27 at the RAI in Amsterdam from September 13th to 17th.


RealTalk (Pt. II): How We Recreated Joe Rogan's Voice Using AI

#artificialintelligence

ICYMI: Earlier this summer we broke new ground with RealTalk, a speech synthesis system created by Machine Learning Engineers at Dessa. With their AI-powered text-to-speech system, the team managed to replicate the voice of Joe Rogan, a podcasting legend known for his irreverent takes on consciousness, sports and technology. On top of that, their recreation of Rogan's voice is the most realistic AI voice that's been released to date. If you haven't heard the voice yet, you should. Here's the video we shared on YouTube featuring a medley of their faux Rogan's musings: Since then, the public's response to the work has wowed us.


McDonald's acquires voice-recognition company to improve its drive-thru game

#artificialintelligence

McDonald's announced it will McBuy the Bay Area voice-recognition startup Apprente for an undisclosed amount. According to McDonald's, Apprente's "sound-to-meaning" technology handles "complex, multilingual, multi-accent and multi-item conversational ordering," and believes the technology will help streamline the drive-thru process -- even faster food, you say?? As the earth turns and the centuries change, so does the way people wish to order a Big Mac, and Micky D's has the cash to listen. Back in March, the company bought Dynamic Yield, which customizes drive-thru menus based on factors like weather, time of day, and customer order profiles. A month later, it invested in New Zealand app-designer Plexure, which will help connect customers to its new smart drive-thrus, among other things.


AI 'synthetic brains' will allow humans to be in '500 places at once'

#artificialintelligence

AI-powered synthetic brains will allow humans to operate 500 versions of themselves at once, according to the man behind Amazon's voice assistant. Igor Jablokov believes artificial intelligence will become so advanced we will be unable to distinguish between a real or "synthetic" mind. The CEO of Pryon previously founded Yap, a fully-automated cloud platform for voice recognition, which was snapped up by Amazon before being used for the popular Alexa. The device uses a non-human voice to communicate with users, but Igor warns such technology could change with terrifying consequences. He told the Financial Times: "People will not be able to tell if they are interacting with you or your AI proxy. "Right now, you could be doing two interviews at once.


Do you want fr-AI-s with that appy-meal? McDonald's gobbles machine-learning biz for space-age Drive Thrus

#artificialintelligence

McDonald's has wolfed down Apprente, an AI startup focused on voice recognition. One of America's biggest fast-food chains wants to get its greasy hands on machine learning. Apprente, based in Mountain View, California, was founded in 2017, and has been building speech-powered customer-service chatbots. Now, the team will be rebranded as McD Tech Labs, and will slap their technology into McDonald's Drive Thru service. "The initial focus of the Silicon Valley team will be to enhance technology for use in McDonald's Drive Thru," gushed the McFlurry giant in a statement.


Neural Text-to-Speech Makes Speech Synthesizers Much More Versatile : Alexa Blogs

#artificialintelligence

A text-to-speech system, which converts written text into synthesized speech, is what allows Alexa to respond verbally to requests or commands. Through a service called Amazon Polly, text-to-speech is also a technology that Amazon Web Services offers to its customers. Last year, both Alexa and Polly evolved toward neural-network-based text-to-speech systems, which synthesize speech from scratch, rather than the earlier unit-selection method, which strung together tiny snippets of pre-recorded sounds. In user studies, people tend to find speech produced by neural text-to-speech (NTTS) systems more natural-sounding than speech produced by unit selection. But the real advantage of NTTS is its adaptability, something we demonstrated last year in our work on changing the speaking style ("newscaster" versus "neutral") of an NTTS system.