More public data key to democratizing ML, says MLCommons

Apr-19-2022, 14:31:16 GMT–AITopics Custom Links

Unless you're an English speaker, and one with as neutral an American accent as possible, you've probably butted heads with a digital assistant that couldn't understand you. With any luck, a couple of open-source datasets from MLCommons could help future systems grok your voice. The two datasets, which were made generally available in December, are the People's Speech Dataset (PSD), a 30,000-hour database of spontaneous English speech; and the Multilingual Spoken Words Corpus (MSWC), a dataset of some 340,000 keywords in 50 languages. By making both datasets publicly available under CC-BY and CC-BY-SA licenses, MLCommons hopes to democratize machine learning – that is to say, make it available to everyone – and help push the industry toward data-centric AI. David Kanter, executive director and founder of MLCommons, told Nvidia in a podcast this week that he sees data-centric AI as a conceptual pivot from "which model is the most accurate," to "what can we do with data to improve model accuracy."

dataset, kanter, mlcommon, (11 more...)

AITopics Custom Links

Apr-19-2022, 14:31:16 GMT

News Web Page

Add feedback

AI-Alerts:
- 2022 > 2022-04 > AAAI AI-Alert for Apr 19, 2022 (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.52)
  - Representation & Reasoning (0.42)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found