Speech


r/MachineLearning - [R] Parallel Neural Text-to-Speech

#artificialintelligence

Abstract: In this work, we propose a non-autoregressive seq2seq model that converts text to spectrogram. It is fully convolutional and obtains about 17.5 times speed-up over Deep Voice 3 at synthesis while maintaining comparable speech quality using a WaveNet vocoder. Interestingly, it has even fewer attention errors than the autoregressive model on the challenging test sentences. Furthermore, we build the first fully parallel neural text-to- speech system by applying the inverse autoregressive flow (IAF) as the parallel neural vocoder. Our system can synthesize speech from text through a single feed-forward pass.


Chatterbox is a DIY Kids Smart Speaker that Features Open-Source and Private Voice Assistant, Mycroft - Voicebot

#artificialintelligence

Chatterbox is a build-it-yourself, program-it-yourself smart speaker that teaches kids how to program a voice-based AI system. The company is able to ensure complete privacy because it is using Mycroft, an open-source voice assistant that is not always listening, not collecting any data, and not advertising. In addition, the product is fully compliant with the Children's Online Privacy Protection Act (COPPA), which is what the Federal Trade Commission uses to regulate governing online services directed at children under 13 years of age. The company announced recently that it would be launching a Kickstarter campaign on April 30th, and will ship to consumers in schools in December 2019, with a suggested retail price of $179. We've honed in on privacy, safety, and accessibility because our mission is to provide the healthiest and safest alternative computing platform for children.


Google AI 'Translatotron' Can Make Anyone a Real-Time Polyglot

#artificialintelligence

Google AI yesterday released its latest research result in speech-to-speech translation, the futuristic-sounding "Translatotron." Billed as the world's first end-to-end speech-to-speech translation model, Translatotron promises the potential for real-time cross-linguistic conversations with low latency and high accuracy. Humans have always dreamed of a voice-based device that could enable them to simply leap over language barriers. While advances in deep learning have contributed to highly improved accuracy in speech recognition and machine translation, smooth conversations between different language speakers remained hampered by unnatural pauses during machine processing. Google's wireless headphone Pixel Bud released in 2017 boasted real-time speech translation, but users found the practical experience less then satisfying.


Google's new AI can help you speak another language in your own voice

#artificialintelligence

Google Translate is one of the company's most used products. It helps people translate one language to another through typing, taking pics of text, and using speech-to-text technology. Now, the company's launching a new project called Translatotron, which will offer direct speech-to-speech translations – without even using any text. In a post on Google's AI blog, the team behind the tool explained that instead of using speech-to-text and then text-to-speech to convert voice, it relied on a new model (which runs on a neural network) to develop the new system. Get 50% off tickets if you buy now.


Google's latest Translate function turns speech of one dialect directly into another

Daily Mail - Science & tech

Google has announced a new translate tool which convert sone language into another and preserves the speaker's original voice. The tech giant's new system works without the need to convert it to text before. A first-of-its-kind, the tool is able to do this while retaining the voice of the original speaker and making it sound'more realistic', the tech giant said. Google claims the system, dubbed'Translatotron', will be able to retain the voice of the original speaker after translation while also understanding words better. Google has announced that their new translate tool will convert one language into another without the intermediate text-based process.


Spotify unveils a voice-controlled smart device, dubbed 'Car Thing'

Daily Mail - Science & tech

Spotify has launched a new voice-controlled smart device, marking a debut in the hardware industry. Dubbed'Car Thing,' it plugs into a vehicle's cigarette lighter and allows users to turn on their favorite playlist hands-free while they're driving. The device is being rolled out among a small group of test users in the coming weeks, according to the Verge. Spotify has launched a new voice-controlled smart device, marking a debut in the hardware industry. It allows users to turn on their favorite playlist hands-free while they're driving Users plug it into their car's 12-volt outlet, or cigarette lighter.


Spotify is testing a voice-controlled device called "Car Thing"

USATODAY - Tech Top Stories

Spotify announced Friday that the music streaming service is test driving some hardware. The company is trying to learn more about what you do and listen to in your car by publicly testing out a voice-controlled music and podcast device dubbed "Car Thing." The device reportedly plugs into your vehicle's 12-volt outlet, which is also known as a cigarette lighter, for power and the automotive gadget connects to your car and phone via Bluetooth. Don't make plans to go out and buy the device anytime soon, though. Spotify says it's only testing the devices, making them available to a few premium users.


Introducing real-time video tagging and speech-to-text

#artificialintelligence

We've been working with the world's top media companies to automatically tag their video assets for discoverability and higher engagement. They rely on us to extract not only what appears or is mentioned, but most importantly what is relevant about their content in order to really understand it. This level of understanding is how we outperform even the tech giants when it comes to content metadata. When building a system for real-time scenarios such as live streaming, our partners were very clear that it wasn't just about speed but also quality; real-time understanding, not tagging. So over the past few months, we've been redesigning our solution with this goal: reducing as much latency as possible while keeping our core focus on relevancy.


Alexa speech normalization AI reduces errors by up to 81%

#artificialintelligence

Text normalization is a fundamental processing step in most natural language systems. In the case of Amazon's Alexa, "Book me a table at 5:00 p.m." might be transcribed by the assistant's automatic speech recognizer as "five p m" and further reformatted to "5:00PM." Then again, Alexa might convert "5:00PM" to "five thirty p m" for its text-to-speech synthesizer. So how does this work? Currently, Amazon's voice assistant relies on "thousands" of handwritten normalization rules for dates, email addresses, numbers, abbreviations, and other expressions, according to Alexa AI group applied scientist Ming Sun and Alexa Speech machine learning scientist Yuzong Liu.


Alexa speech normalization AI reduces errors by up to 81%

#artificialintelligence

Text normalization is a fundamental processing step in most natural language systems. In the case of Amazon's Alexa, "Book me a table at 5:00 p.m." might be transcribed by the assistant's automatic speech recognizer as "five p m" and further reformatted to "5:00PM." Then again, Alexa might convert "5:00PM" to "five thirty p m" for its text-to-speech synthesizer. So how does this work? Currently, Amazon's voice assistant relies on "thousands" of handwritten normalization rules for dates, email addresses, numbers, abbreviations, and other expressions, according to Alexa AI group applied scientist Ming Sun and Alexa Speech machine learning scientist Yuzong Liu.