Dis-AE: Multi-domain & Multi-task Generalisation on Real-World Clinical Data

Kreuter, Daniel, Tull, Samuel, Gilbey, Julian, Preller, Jacobus, Consortium, BloodCounts!, Aston, John A. D., Rudd, James H. F., Sivapalaratnam, Suthesh, Schönlieb, Carola-Bibiane, Gleadall, Nicholas, Roberts, Michael

arXiv.org Artificial Intelligence 

Machine learning has promised to revolutionise healthcare for several years [1, 2]. Moreover, while there is an extensive literature describing high-performing machine learning models trained on immaculate benchmark datasets [3-5], such promising approaches rarely make it into clinical practice [6]. Often, this is because of an unexpected drop in performance when deploying the model on unseen test data due to domain shift [7, 8], i.e. there is a change in the data distribution between the dataset a model is trained on (source data) and that which it is deployed against (target data). Most common machine learning algorithms rely on an assumption that the source and target data are independent and identically distributed (i.i.d.) [9]. However, with domain shift, this assumption no longer holds, and model performance can be significantly affected. For medical datasets, domain shift is widespread, resulting from differences in equipment and clinical practice between sites [10-13], and models are vulnerable to associating clinically irrelevant features specific to the domain with their predictions, known as shortcut learning [14], which may lead to poor performance on target data. For most medical applications, target data is rarely available prior to real-time deployment; thus, a domain adaptation approach, where pre-trained models are fine-tuned on data from the target distribution is not feasible.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found