More public data key to democratizing ML, says MLCommons

AITopics Custom Links 

Unless you're an English speaker, and one with as neutral an American accent as possible, you've probably butted heads with a digital assistant that couldn't understand you. With any luck, a couple of open-source datasets from MLCommons could help future systems grok your voice. The two datasets, which were made generally available in December, are the People's Speech Dataset (PSD), a 30,000-hour database of spontaneous English speech; and the Multilingual Spoken Words Corpus (MSWC), a dataset of some 340,000 keywords in 50 languages. By making both datasets publicly available under CC-BY and CC-BY-SA licenses, MLCommons hopes to democratize machine learning – that is to say, make it available to everyone – and help push the industry toward data-centric AI. David Kanter, executive director and founder of MLCommons, told Nvidia in a podcast this week that he sees data-centric AI as a conceptual pivot from "which model is the most accurate," to "what can we do with data to improve model accuracy."

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found