[P] The unreasonable usefulness of deep learning in building and cleaning medical image datasets • r/MachineLearning

@machinelearnbot 

One thing I find weird is that we have lots of discussion of deep learning in complex detection and recognition tasks, but very few people talk about how useful deep learning can be for simple but time consuming image data processing tasks, particularly in medical research. In this post I spend a bit of time cleaning up the CXR14 dataset, and in 4 hours find 430 images with various problems that shouldn't be in the dataset (a csv identifying these images is included in the post). While the prevalence of these problems is super low ( 50/100,000), since the visual challenge is very easy the models can achieve absurdly low false positive rates. I even get an AUROC of 1.0 in a 2000 image validation set on one task:) In doing so, cleaning this dataset to remove 3 different problems didn't take me weeks to pore through each image, but under a day. Certainly nothing in the post is technically groundbreaking, but it is hopefully a prompt to consider deep learning when you are doing time consuming processing.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found