IBM Sets New Transcription Performance Milestone on Automatic Broadcast News Captioning
Two years ago IBM set new performance records on conversational telephone speech (CTS) transcription, by benchmarking its deep neural network based speech recognition systems on the Switchboard and Callhome corpora, two popular publicly available data sets for automatic speech recognition [1]. Here we show that this impressive performance holds on other audio genres. Similar to the CTS benchmarks, the industry has for many years evaluated system performances on multimedia audio signals with broadcast news (BN) captioning. We have now achieved a new industry record of 6.5% and 5.9% on two BN benchmarks: RT04 and DEV04F [2]. Both these test sets have been released in the past by the Linguistic Data Consortium (LDC) [3].
Jun-2-2019, 19:50:16 GMT