Do machines actually beat doctors? ROC curves and performance metrics