No, You're Not Alone. Google Is Also Making This Big Mistake On AI
Just this past month, an article was shared that showed that over 30% of the data used by Google for one of their shared machine learning models was mislabeled with the wrong data. Not only was the model itself full of errors, but the actual training data used by that model itself was full of mistakes. How could anyone using Google's model ever hope to trust the results if it's full of human-induced errors that computers can't fix. And Google isn't alone with major data mislabeling, an MIT study in 2021 found that almost 6% of the images in the industry-standard ImageNet database are mislabeled, and furthermore, found "label errors in the test sets of 10 of the most commonly-used computer vision, natural language, and audio datasets". How can we hope to trust or use these models if the data used to train those models is so bad?
Sep-10-2022, 17:56:00 GMT
- Technology: