The speech age
Researchers at MIT have developed a new approach to training speech recognition systems that does not depend on transcriptions – as is the current model. Instead, their system analyses correspondences between images and spoken descriptions of those images, as captured in a large collection of audio recordings. The system then learns a mapping between acoustic features of the recordings correlated with image characteristics. Traditionally speech recognition systems such as those that convert speech to text on smartphones are the result of machine learning systems that go over many thousands of utterances and their transcriptions to learn a mapping between acoustic features and words. While this method works quite well, the requirement of professional grade transcription is costly and time-consuming.
Feb-6-2017, 01:50:03 GMT