Speech encompasses speech understanding/recognition and speech synthesis.
The speech recognition specialist what3words has released a new end-to-end Voice API that was designed to help people figure out where they're going. To that end, the API will allow users to say any three words, and what3words will provide them the address and GPS coordinates that they're trying to get to. The Voice API is built with machine learning and speech recognition technology from Speechmatics, and can be integrated into virtually any service or application. According to what3words, an address request is completed with only a single API call, making the system much simpler to deploy than alternatives that need to blend multiple APIs. The company claims that it takes only a few hours to get the platform up and running.
Otter.ai, an A.I.-powered transcription app and note-takers' best friend, has received a strategic investment from Japan's leading mobile operator and new Otter partner, NTT DOCOMO Inc. The two companies are teaming up to support Otter's expansion into the Japanese market where DOCOMO will be integrating Otter with its own A.I.-based translation service subsidiary, Mirai Translation, in order to provide accurate English transcripts which are then translated into Japanese. The investment was made by DOCOMO's wholly-owned subsidiary, NTT DOCOMO Ventures, Inc., but the size was undisclosed. However, the new round was $10 million in total, we're told. To date, Otter has raised $23 million in funding from NTT DOCOMO Ventures, Fusion Fund, GGV Capital, DFJ Dragon Fund, Duke University Innovation Fund, Harris Barton Asset Management, Slow Ventures, Horizons Ventures, and others.
In the first article of our conversational AI series, we explored how the proliferation of voice assistants and messaging platforms are giving way to a new era of user interfaces (see the sidebar, "A five-part series on conversational AI"). Whether it's in the car, a phone, or a smart home device, nearly 112 million US consumers rely on their voice assistants at least once a month--and that number continues to grow.1 These can range from the mundane, such as misinterpreting a request for ordering a roll of paper towel, to the more troubling error of providing a harmful health recommendation (or conversely, providing an accurate, but difficult to interpret recommendation).2 Despite the uptick in adoption of voice-enabled virtual assistants, designing effective products is a nontrivial endeavor. Virtual assistants often deal with multiple, sometimes complex scenarios that require understanding a range of queries to which users expect a quick, accurate, and easily interpretable response.
Salesforce Einstein is an AI-based assistant that enables companies to deliver a smarter, personalized and more predictive experience to their customers. Salesforce Einstein makes it simple for the salesforce developers by cutting down the difficulty of making data models. This helps the companies to predict future behavior and proactively recommend the next best action for the users. These below features make Salesforce Einstein more ease to use. Voice assistant: This was the key feature of this year's Salesforce release.
They say that it improves the perceived naturalness of dubbing and highlights the relative importance of each proposed step. As the paper's coauthors note, automatic dubbing involves transcribing speech to text and translating that text into another language before generating speech from the translated text.
Over the years that I have spent with startups, I've come across both genuine and fake AI products. I'll start with the ones that truly solved problems using AI. A few years ago, one of the co-founders of Liv.ai, a Bengaluru-based AI startup, met me and demonstrated their product that used natural language processing to convert speech to text in multiple Indian languages. I had always known that text to speech was easy, but converting speech to text in multiple languages was a hard problem to solve. I was a bit sceptical at first, but when I saw the product, I was quite blown away.
Alexa, are you eavesdropping on me? I passive-aggressively ask my Amazon Echo this question every so often. Because as useful as AI has become, it's also very creepy. And this, of course, produces privacy nightmares, as when Amazon or Google subcontractors sit around listening to our audio snippets or hackers remotely spy on our kids. The problem here is structural.
Microsoft is the market leader when it comes to providing infrastructure as a service (IaaS) and platform as a service (PaaS) solutions. Microsoft Azure is the project that has not only benefitted the company in terms of ROI but has also changed the business dynamics of organizations around the globe. More and more companies are adopting Azure for their cloud and data products. Microsoft Azure AI was launched in 2018 and has emerged to be a success in the artificial intelligence services market too. Azure AI is a set of AI services built on Microsoft's breakthrough innovation from decades of world-class research in vision, speech, language processing, and custom machine learning.
Automatic speech synthesis is a challenging task that is becoming increasingly important as edge devices begin to interact with users through speech. Typical text-to-speech pipelines include a vocoder, which translates intermediate audio representations into an audio waveform. Most existing vocoders are difficult to parallelize since each generated sample is conditioned on previous samples. WaveGlow is a flow-based feed-forward alternative to these auto-regressive models (Prenger et al., 2019). However, while WaveGlow can be easily parallelized, the model is too expensive for real-time speech synthesis on the edge.