Evaluation of state-of-the-art deep learning models in the segmentation of the heart ventricles in parasternal short-axis echocardiograms

Buritica, Julian Rene Cuellar, Dinh, Vu, Burri, Manjula, Roelandts, Julie, Wendling, James, Klingensmith, Jon D.

arXiv.org Artificial Intelligence 

Previous studies on echocardiogram segmentation are focused on the left ventricle in parasternal long-axis views. In this study, deep-learning models were evaluated on the segmentation of the ventricles in parasternal short-axis echocardiograms (PSAX-echo). Segmentation of the ventricles in complementary echocardiogram views will allow the computation of important metrics with the potential to aid in diagnosing cardio-pulmonary diseases and other cardiomyopathies. Evaluating state-of-the-art models with small datasets can reveal if they improve performance on limited data. PSAX-echo were performed on 33 volunteer women. An experienced cardiologist identified end-diastole and end-systole frames from 387 scans, and expert observers manually traced the contours of the cardiac structures. Traced frames were pre-processed and used to create labels to train 2 specific-domain (Unet-Resnet101 and Unet-ResNet50), and 4 general-domain (3 Segment Anything (SAM) variants, and the Detectron2) deep-learning models. The performance of the models was evaluated using the Dice similarity coefficient (DSC), Hausdorff distance (HD), and difference in cross-sectional area (DCSA). The Unet-Resnet101 model provided superior performance in the segmentation of the ventricles with 0.83, 4.93 pixels, and 106 pixel2 on average for DSC, HD, and DCSA respectively. A fine-tuned MedSAM model provided a performance of 0.82, 6.66 pixels, and 1252 pixel2, while the Detectron2 model provided 0.78, 2.12 pixels, and 116 pixel2 for the same metrics respectively. Deep-learning models are suitable for the segmentation of the left and right ventricles in PSAX-echo. This study demonstrated that specific-domain trained models such as Unet-ResNet provide higher accuracy for echo segmentation than general-domain segmentation models when working with small and locally acquired datasets.