Goto

Collaborating Authors

 auprg



Precision-Recall-Gain Curves: PR Analysis Done Right Peter A. Flach

Neural Information Processing Systems

Precision-Recall analysis abounds in applications of binary classification where true negatives do not add value and hence should not affect assessment of the classifier's performance. Perhaps inspired by the many advantages of receiver operating characteristic (ROC) curves and the area under such curves for accuracybased performance assessment, many researchers have taken to report Precision-Recall (PR) curves and associated areas as performance metric. We demonstrate in this paper that this practice is fraught with difficulties, mainly because of incoherent scale assumptions - e.g., the area under a PR curve takes the arithmetic mean of precision values whereas the F


Towards Reliable Dermatology Evaluation Benchmarks

Gröger, Fabian, Lionetti, Simone, Gottfrois, Philippe, Gonzalez-Jimenez, Alvaro, Groh, Matthew, Daneshjou, Roxana, Consortium, Labelling, Navarini, Alexander A., Pouly, Marc

arXiv.org Artificial Intelligence

Benchmark datasets for digital dermatology unwittingly contain inaccuracies that reduce trust in model performance estimates. We propose a resource-efficient data-cleaning protocol to identify issues that escaped previous curation. The protocol leverages an existing algorithmic cleaning strategy and is followed by a confirmation process terminated by an intuitive stopping criterion. Based on confirmation by multiple dermatologists, we remove irrelevant samples and near duplicates and estimate the percentage of label errors in six dermatology image datasets for model evaluation promoted by the International Skin Imaging Collaboration. Along with this paper, we publish revised file lists for each dataset which should be used for model evaluation. Our work paves the way for more trustworthy performance assessment in digital dermatology.