New Datasets to Democratize Speech Recognition Technology
The next wave of AI will be powered by the democratization of data. Open-source frameworks such as TensorFlow and Pytorch have brought machine learning to a huge developer base, but most state-of-the-art models still rely on training datasets which are either wholly proprietary or prohibitively expensive to license [1]. As a result, the best automated speech recognition (ASR) models for converting speech audio into text are only available commercially, and are trained on data unavailable to the general public. Furthermore, only widely-spoken languages receive industry attention due to market incentives, limiting the availability of cutting-edge speech technology to English and a handful of other languages. The first is prohibitive licensing: Several free datasets do exist, but most of sufficient size and quality to make models truly shine are barred from commercial use. As a response, we created The People's Speech, a massive English-language dataset of audio transcriptions of full sentences (see Sample 1).
Jan-15-2022, 07:10:28 GMT
- Country:
- Africa > East Africa (0.04)
- Oceania > Australia
- Queensland > Brisbane (0.04)
- North America
- United States > Texas
- Dallas County > Dallas (0.04)
- Canada
- Quebec > Montreal (0.04)
- Alberta > Census Division No. 11
- Edmonton Metropolitan Region > Edmonton (0.04)
- United States > Texas
- Genre:
- Research Report > Promising Solution (0.34)
- Industry:
- Information Technology (0.94)
- Health & Medicine (0.69)
- Technology: