Raspberry Pi: Google plans more AI projects to follow DIY voice recognition kit

ZDNet

The Kit originally came with a copy of the Raspberry Pi Magazine. Google is working on more artificial intelligence projects to follow its Voice Kit for Raspberry Pi. Four ways to explore the use of voice technology for your business. Google's AIY Voice Kit is a do-it-yourself voice-recognition kit for Raspberry Pi-based maker projects. The initial run of the kits sold out in a few hours, but Google said more will be available for purchase in stores and online in the US in the coming weeks, and the kit will be available elsewhere by the end of the year.


Are Voice Recognition Based Payments The Next Step in FinTech Convenience? - FindBiometrics

#artificialintelligence

PayPal may be looking into voice recognition to enable more digital commerce use cases in the near future, if a new post-MWC blog post offers any hints. Looking back on last week's event -- for which we featured extensive firsthand coverage -- PayPal Head of Global Initiatives Anuj Nayar notes two dominant trends. One is the Internet of Things, including new connected car technologies like PayPal's new car commerce feature with Shell and Jaguar (and Apple). The other, as Nayar puts it, is "conversational commerce." Looking at emerging digital commerce opportunities in areas like virtual reality, connected appliances, and even drones, Nayar asserts that it "won't be convenient or realistic to pull out a credit card or punch in your information in any of these scenarios".


Voice recognition and machine learning make service bots better

#artificialintelligence

We are on the cusp of a technological revolution whereby increasingly sophisticated tasks can be handed over from humans to machines. Organizations are embracing advancements in artificial intelligence, robotics, and natural language technology to adopt platforms that can "learn" from experience and actually interact with users. The next wave of these chatbots will have enhanced real-time data analytics and automation capabilities and the ability to integrate intelligence across multiple digital channels to engage customers in natural conversations using voice or text. When you have a question about a product or service, you will be presented with the best agent, who possesses the entire company's collective experience and a huge wealth of knowledge to address your issue. Think about what happens today when you call your bank or the help desk of an ecommerce site.


Voice recognition software advancing rapidly. Will talking replace typing?

#artificialintelligence

Since Apple developed Siri there have been great strides made in the science of voice recognition. Will we soon be throwing away our mice and keyboards and simply talking to our computers? Or will the problems I have with Alexa continue to haunt voice recognition? My wife and I are like all married couples at breakfast. We do not speak to each other.


Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

Neural Information Processing Systems

We describe a neural network-based system for text-to-speech (TTS) synthesis that is able to generate speech audio in the voice of many different speakers, including those unseen during training. Our system consists of three independently trained components: (1) a speaker encoder network, trained on a speaker verification task using an independent dataset of noisy speech from thousands of speakers without transcripts, to generate a fixed-dimensional embedding vector from seconds of reference speech from a target speaker; (2) a sequence-to-sequence synthesis network based on Tacotron 2, which generates a mel spectrogram from text, conditioned on the speaker embedding; (3) an auto-regressive WaveNet-based vocoder that converts the mel spectrogram into a sequence of time domain waveform samples. We demonstrate that the proposed model is able to transfer the knowledge of speaker variability learned by the discriminatively-trained speaker encoder to the new task, and is able to synthesize natural speech from speakers that were not seen during training. We quantify the importance of training the speaker encoder on a large and diverse speaker set in order to obtain the best generalization performance. Finally, we show that randomly sampled speaker embeddings can be used to synthesize speech in the voice of novel speakers dissimilar from those used in training, indicating that the model has learned a high quality speaker representation.