Predicting microsatellite instability and key biomarkers in colorectal cancer from H&E-stained images: Achieving SOTA predictive performance with fewer data using Swin Transformer

Guo, Bangwei, Li, Xingyu, Jonnagaddala, Jitendra, Zhang, Hong, Xu, Xu Steven

arXiv.org Artificial Intelligence 

Artificial intelligence (AI) models have been developed to predict clinically relevant biomarkers for colorectal cancer (CRC), including microsatellite instability (MSI). However, existing deep-learning networks are data-hungry and require large training datasets, which are often lacking in the medical domain. In this study, based on the latest Hierarchical Vision Transformer using Shifted Windows (Swin-T), we developed an efficient workflow for biomarkers in CRC (MSI, hypermutation, chromosomal instability, CpG island methylator phenotype, BRAF, and TP53 mutation) that required relatively small datasets, but achieved a state-of-the-art (SOTA) predictive performance. Our Swin-T workflow substantially outperformed published models in an intra-study cross-validation experiment using the TCGA-CRC-DX dataset (N = 462). It also demonstrated excellent generalizability in cross-study external validation and delivered a SOTA AUROC of 0.90 for MSI, using the MCO dataset for training (N = 1065) and the TCGA-CRC-DX for testing. A similar performance (AUROC = 0.91) was achieved by Echle et al., using ~8000 training samples (ResNet18) on the same testing dataset. Swin-T was extremely efficient when using small training datasets and exhibited robust predictive performance with 200-500 training samples. These data indicate that Swin-T could be 5-10 times more efficient than existing algorithms for MSI based on ResNet18 and ShuffleNet. Furthermore, the Swin-T models showed promise as pre-screening tests for MSI status and BRAF mutation status, which could exclude and reduce the samples before subsequent standard testing in a cascading diagnostic workflow, to allow a reduction in turnaround time and costs.