New research highlights how error-ridden data used to train AI is - RealKM
Originally posted on The Horizons Tracker. The world is awash with data, and it's tempting to think that this data is what's used to train the AI systems that are increasingly prevalent around the world. New research1 from MIT highlights how not only is AI often trained on relatively small samples of curated data, but this data often contains errors that undermine the training delivered to machine learning algorithms. Indeed, across 10 of the most-cited datasets used by scientists to train machine learning systems, the researchers found that 3% of the data was mislabeled or inaccurate. It has long been suspected that the data used to train AI systems is not what it could be, but until now no one has been able to quantify just how poor it is.
Jul-26-2021, 13:45:11 GMT
- Technology: