TL;DR Baidu's TTS system now supports multi-speaker conditioning, and can learn new speakers with very little data (a la LyreBird). I'm really excited about the recent influx of neural-net TTS systems, but all of the them seem to be too slow for real time dialog, or not publicly available, or both. Hoping that one of them gets a high quality open-source implementation soon!
Emotion Detection and Recognition market to reach USD 22.65 billion by 2020 This study has been done on a global level broadly covering four regions, namely, North America, Europe, APAC, Middle East and RoW, and the market is projected to grow from USD 5.66 billion in 2015 to USD 22.65 billion by 2020, at a CAGR of 31.9% during the period. The market is being driven by factors such as increased focus on affective computing, business intelligence, and growing amount of spatial data as well as prompt availability of analytical tools. "Law Enforcement, Surveillance, and Monitoring areas are projected to showcase robust growth in the emotion detection and recognition market" The defense and security agencies require emotion detection technology for surveillance and monitoring purposes. Major implementation of this technology has already been done in the areas of military services such as lie detectors and polygraph tests. The emotion detection technology helps in matching the records in real-time and detecting the stress levels of a criminal.
Microsoft has reached a milestone in text-to-speech synthesis with a production system that uses deep neural networks to make the voices of computers nearly indistinguishable from recordings of people. With the human-like natural prosody and clear articulation of words, Neural TTS has significantly reduced listening fatigue when you interact with AI systems. Our team demonstrated our neural-network powered text-to-speech capability at the Microsoft Ignite conference in Orlando, Florida, this week. The capability is currently available in preview through Azure Cognitive Services Speech Services. Neural text-to-speech can be used to make interactions with chatbots and virtual assistants more natural and engaging, convert digital texts such as e-books into audiobooks and enhance in-car navigation systems.
Panasonic and Nuance have been close partners on TV voice recognition in the past; we now know that they're getting a bit cozier for Panasonic's 2013 Smart TVs. The engine will also speak out content and menus if you need more than just visual confirmation of where you're going. Panasonic's refreshed TV line is gradually rolling out over the spring, so those who see a plastic remote control as so very 2010 won't have long to wait. Panasonic's New Smart TVs Now Listen and Speak with Nuance's Dragon TV Panasonic's New SMART VIERA HDTVs Voice Interaction Lets People Find TV Content, Search the Web, Get Access to Apps and More with the Power of Dragon Now people can simply sit back and speak to find content, search the web, control volume and more – creating a more interactive and intelligent television experience. And with Dragon TV's text-to-speech, television content and options on the screen can be read aloud.