human-like accuracy
Mozilla's open source voice recognition tool nears human-like accuracy
The free-software company also on Wednesday released a first set of crowdsourced recordings under its Common Voice project, designed to let anyone train and test machine learning algorithms to recognize speech. The dataset includes almost 400,000 downloadable samples, adding up to 500 hours of speech. More than 20,000 people from around the world have contributed to a call for recordings, which Mozilla hopes will help future voice-powered systems fluently understand a wide variety of accents and types of speech. "We at Mozilla believe technology should be open and accessible to all, and that includes voice," Mozilla Senior Vice President of Emerging Technologies Sean White wrote in a blog post. The speech recognition tool, called DeepSpeech, has an impressive per-word error rate of about 6.5%, ahead of the company's stated goal of 10%, but still shy of Microsoft's achievement this year of 5.5%.
IBM inches toward human-like accuracy for speech recognition
Microsoft claimed to reach a 5.9 percent word error rate last October using neural language models resembling associative word clouds. At the time, the company believed 5.9 percent was equivalent to human parity. But, IBM says it's not popping the champagne yet. "As part of our process in reaching today's milestone, we determined human parity is actually lower than what anyone has yet achieved -- at 5.1 percent," George Saon, IBM principal research scientist, wrote in a blog post this week. IBM reached the 5.5 percent milestone by combining so-called Long Short-Term Memory, an artificial neural network, and WaveNet language models with three strong acoustic models.