Challenge of Directly Comparing Imaging-Based Diagnoses Made by Machine Learning Algorithms With Those Made by Human Clinicians