MIT study finds 'systematic' labeling errors in popular AI benchmark datasets

Apr-2-2021, 21:45:10 GMT–#artificialintelligence

The field of AI and machine learning is arguably built on the shoulders of a few hundred papers, many of which draw conclusions using data from a subset of public datasets. Large, labeled corpora have been critical to the success of AI in domains ranging from image classification to audio classification. That's because their annotations expose comprehensible patterns to machine learning algorithms, in effect telling machines what to look for in future datasets so they're able to make predictions. But while labeled data is usually equated with ground truth, datasets can -- and do -- contain errors. The processes used to construct corpora often involve some degree of automatic annotation or crowdsourcing techniques that are inherently error-prone.

dataset, imagenet, popular ai benchmark dataset, (14 more...)

#artificialintelligence

Apr-2-2021, 21:45:10 GMT

News Web Page

Add feedback

Genre:
- Research Report > New Finding (0.71)

Industry:
- Information Technology (0.31)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found