The new system, dubbed the Translatotron, has three components, all of which look at the speaker's audio spectrogram--a visual snapshot of the frequencies used when the sound is playing, often called a voiceprint. The first component uses a neural network trained to map the audio spectrogram in the input language to the audio spectrogram in the output language. The second converts the spectrogram into an audio wave that can be played.
On Wednesday, Google unveiled Translatotron, an in-development speech-to-speech translation system. It's not the first system to translate speech from one language to another, but Google designed Translatotron to do something other systems can't: retain the original speaker's voice in the translated audio. In other words, the tech could make it sound like you're speaking a language you don't know -- a remarkable step forward on the path to breaking down the global language barrier. According to Google's AI blog, most speech-to-speech translation systems follow a three-step process. First they transcribe the speech.
The rhetoric surrounding AI and robots have some believing that we are nearing the ability to introduce something like Joi, the AI hologram from Blade Runner 2049. While in fact this kind of advancement remains in the realms of fiction, the AI Index Annual Report 2017 shows that AI is fighting to level the playing field in the battle of humans versus machines. With Artificial Intelligence technologies being developed in a wide range of applications, the AI Index revealed several surprising insights on where humans stand in the robot vs biological brain race. While robots easily outperform regular employees in certain visual tasks, natural language processing is not yet superior to human capability. Scientist reached a major breakthrough this year, with tests revealing that the best Artificial Intelligence system recognised speech from phone call audio at 95% – neck-and-neck with human ability.
Voice control was all the rage at CES 2017 and this year's show appears to continue the trend. This time, another big name in audio is getting into the game: Klipsch. The company will have options for both Alexa and Google Assistant, so you'll have some choice when it comes to the new feature and new audio gear.
Speech processing is a very popular area of machine learning. There is a significant demand in transforming human speech into text and text into speech. It is especially important regarding the development of self-services in different places: shops, transport, hotels, etc. Machines replace more and more human labor force, and these machines should be able to communicate with us using our language. That's why speech recognition is a perspective and significant area of artificial intelligence and machine learning. Today, many large companies provide APIs for performing different machine learning tasks. Speech recognition is not an exception. You don't have to be the expert in natural language processing to use these APIs.