Data distribution impacts the performance and generalisability of contrastive learning-based foundation models of electrocardiograms

Khattak, Gul Rukh, Patlatzoglou, Konstantinos, Barker, Joseph, Pastika, Libor, Zeidaabadi, Boroumand, El-Medany, Ahmed, Aggour, Hesham, Liang, Yixiu, Ribeiro, Antonio H., Annis, Jeffrey, Ribeiro, Antonio Luiz Pinho, Ge, Junbo, Kramer, Daniel B., Waks, Jonathan W., Brittain, Evan, Peters, Nicholas, Ng, Fu Siong, Sau, Arunashis

arXiv.org Artificial Intelligence 

Department of Cardiology, Imperial College Healthcare NHS Trust, London, United Kingdom Disclosures: JWW and DBK were previously on the advisory board for Heartcor solutions LLC, forwhom they remain independent consultants. JWW reports research funding fromAnumana and is a consultant for HeartBeam Inc. FSN reports speaker fees from GEhealthcare and is on the advisory board for Astra Zeneca. The remaining authorshave no conflicts to declare. Heart and Lung Institute, Imperial College LondonHammersmith Campus Du Cane RoadLondon W12 0NN Abstract Contrastive learning is a widely adopted self-supervised pretraining strategy, yet itsdependence on cohort composition remains underexplored. We systematically assess how cohort demographics,health status, and population diversity influence the downstream performance forprediction tasks also including two additional cohorts from another continent (Europe).We find that downstream performance depends on the distributional properties of thepretraining cohort, including demographics and health status. Moreover, whilepretraining with a multi-centre, demographically diverse cohort improves in-distributionaccuracy, it reduces out-of-distribution (OOD) generalisation of our contrastiveapproach by encoding cohort-specific artifacts.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found