Goto

Collaborating Authors

 echocardiography


EchoAgent: Guideline-Centric Reasoning Agent for Echocardiography Measurement and Interpretation

Daghyani, Matin, Wang, Lyuyang, Hashemi, Nima, Medhat, Bassant, Abdelsamad, Baraa, Velez, Eros Rojas, Li, XiaoXiao, Tsang, Michael Y. C., Luong, Christina, Tsang, Teresa S. M., Abolmaesumi, Purang

arXiv.org Artificial Intelligence

Purpose: Echocardiographic interpretation requires video-level reasoning and guideline-based measurement analysis, which current deep learning models for cardiac ultrasound do not support. We present EchoAgent, a framework that enables structured, interpretable automation for this domain. Methods: EchoAgent orchestrates specialized vision tools under Large Language Model (LLM) control to perform temporal localization, spatial measurement, and clinical interpretation. A key contribution is a measurement-feasibility prediction model that determines whether anatomical structures are reliably measurable in each frame, enabling autonomous tool selection. We curated a benchmark of diverse, clinically validated video-query pairs for evaluation. Results: EchoAgent achieves accurate, interpretable results despite added complexity of spatiotemporal video analysis. Outputs are grounded in visual evidence and clinical guidelines, supporting transparency and traceability. Conclusion: This work demonstrates the feasibility of agentic, guideline-aligned reasoning for echocardiographic video analysis, enabled by task-specific tools and full video-level automation. EchoAgent sets a new direction for trustworthy AI in cardiac ultrasound.




Video CLIP Model for Multi-View Echocardiography Interpretation

Takizawa, Ryo, Kodera, Satoshi, Kabayama, Tempei, Matsuoka, Ryo, Ando, Yuta, Nakamura, Yuto, Settai, Haruki, Takeda, Norihiko

arXiv.org Artificial Intelligence

Echocardiography records ultrasound videos of the heart, enabling clinicians to assess cardiac function. Recent advances in large-scale vision-language models (VLMs) have spurred interest in automating echocardiographic interpretation. However, most existing medical VLMs rely on single-frame (image) inputs, which can reduce diagnostic accuracy for conditions identifiable only through cardiac motion. In addition, echocardiographic videos are captured from multiple views, each varying in suitability for detecting specific conditions. Leveraging multiple views may therefore improve diagnostic performance. We developed a video-language model that processes full video sequences from five standard views, trained on 60,747 echocardiographic video-report pairs. We evaluated the gains in retrieval performance from video input and multi-view support, including the contributions of various pretrained models. Code and model weights are available at https://github.com/UTcardiology/video-echo-clip


Estimation of Segmental Longitudinal Strain in Transesophageal Echocardiography by Deep Learning

Taskén, Anders Austlid, Judge, Thierry, Berg, Erik Andreas Rye, Yu, Jinyang, Grenne, Bjørnar, Lindseth, Frank, Aakhus, Svend, Jodoin, Pierre-Marc, Duchateau, Nicolas, Bernard, Olivier, Kiss, Gabriel

arXiv.org Artificial Intelligence

Segmental longitudinal strain (SLS) of the left ventricle (LV) is an important prognostic indicator for evaluating regional LV dysfunction, in particular for diagnosing and managing myocardial ischemia. Current techniques for strain estimation require significant manual intervention and expertise, limiting their efficiency and making them too resource-intensive for monitoring purposes. This study introduces the first automated pipeline, autoStrain, for SLS estimation in transesophageal echocardiography (TEE) using deep learning (DL) methods for motion estimation. We present a comparative analysis of two DL approaches: TeeFlow, based on the RAFT optical flow model for dense frame-to-frame predictions, and TeeTracker, based on the CoTracker point trajectory model for sparse long-sequence predictions. As ground truth motion data from real echocardiographic sequences are hardly accessible, we took advantage of a unique simulation pipeline (SIMUS) to generate a highly realistic synthetic TEE (synTEE) dataset of 80 patients with ground truth myocardial motion to train and evaluate both models. Our evaluation shows that TeeTracker outperforms TeeFlow in accuracy, achieving a mean distance error in motion estimation of 0.65 mm on a synTEE test dataset. Clinical validation on 16 patients further demonstrated that SLS estimation with our autoStrain pipeline aligned with clinical references, achieving a mean difference (95\% limits of agreement) of 1.09% (-8.90% to 11.09%). Incorporation of simulated ischemia in the synTEE data improved the accuracy of the models in quantifying abnormal deformation. Our findings indicate that integrating AI-driven motion estimation with TEE can significantly enhance the precision and efficiency of cardiac function assessment in clinical settings.


Finding an Initial Probe Pose in Teleoperated Robotic Echocardiography via 2D LiDAR-Based 3D Reconstruction

Roshan, Mariadas Capsran, Hidalgo, Edgar M, Isaksson, Mats, Dunn, Michelle, Pyaraka, Jagannatha Charjee

arXiv.org Artificial Intelligence

Echocardiography is a key imaging modality for cardiac assessment but remains highly operator-dependent, and access to trained sonographers is limited in underserved settings. Teleoperated robotic echocardiography has been proposed as a solution; however, clinical studies report longer examination times than manual procedures, increasing diagnostic delays and operator workload. Automating non-expert tasks, such as automatically moving the probe to an ideal starting pose, offers a pathway to reduce this burden. Prior vision- and depth-based approaches to estimate an initial probe pose are sensitive to lighting, texture, and anatomical variability. We propose a robot-mounted 2D LiDAR-based approach that reconstructs the chest surface in 3D and estimates the initial probe pose automatically. To the best of our knowledge, this is the first demonstration of robot-mounted 2D LiDAR used for 3D reconstruction of a human body surface. Through plane-based extrinsic calibration, the transformation between the LiDAR and robot base frames was estimated with an overall root mean square (RMS) residual of 1.8 mm and rotational uncertainty below 0.2°. The chest front surface, reconstructed from two linear LiDAR sweeps, was aligned with non-rigid templates to identify an initial probe pose. A mannequin-based study assessing reconstruction accuracy showed mean surface errors of 2.78 +/- 0.21 mm. Human trials (N=5) evaluating the proposed approach found probe initial points typically 20-30 mm from the clinically defined initial point, while the variation across repeated trials on the same subject was less than 4 mm.


From Transthoracic to Transesophageal: Cross-Modality Generation using LoRA Diffusion

Oladokun, Emmanuel, Ou, Yuxuan, Novikova, Anna, Kulikova, Daria, Thomas, Sarina, Šprem, Jurica, Grau, Vicente

arXiv.org Artificial Intelligence

Deep diffusion models excel at realistic image synthesis but demand large training sets-an obstacle in data-scarce domains like transesophageal echocardiography (TEE). While synthetic augmentation has boosted performance in transthoracic echo (TTE), TEE remains critically underrepresented, limiting the reach of deep learning in this high-impact modality. We address this gap by adapting a TTE-trained, mask-conditioned diffusion backbone to TEE with only a limited number of new cases and adapters as small as $10^5$ parameters. Our pipeline combines Low-Rank Adaptation with MaskR$^2$, a lightweight remapping layer that aligns novel mask formats with the pretrained model's conditioning channels. This design lets users adapt models to new datasets with a different set of anatomical structures to the base model's original set. Through a targeted adaptation strategy, we find that adapting only MLP layers suffices for high-fidelity TEE synthesis. Finally, mixing less than 200 real TEE frames with our synthetic echoes improves the dice score on a multiclass segmentation task, particularly boosting performance on underrepresented right-heart structures. Our results demonstrate that (1) semantically controlled TEE images can be generated with low overhead, (2) MaskR$^2$ effectively transforms unseen mask formats into compatible formats without damaging downstream task performance, and (3) our method generates images that are effective for improving performance on a downstream task of multiclass segmentation.




Recovering Diagnostic Value: Super-Resolution-Aided Echocardiographic Classification in Resource-Constrained Imaging

Babu, Krishan Agyakari Raja, Prabhu, Om, Annu, null, Sivaprakasam, Mohanasankar

arXiv.org Artificial Intelligence

Automated cardiac interpretation in resource-constrained settings (RCS) is often hindered by poor-quality echocardiographic imaging, limiting the effectiveness of downstream diagnostic models. While super-resolution (SR) techniques have shown promise in enhancing magnetic resonance imaging (MRI) and computed tomography (CT) scans, their application to echocardiography--a widely accessible but noise-prone modality--remains underexplored. In this work, we investigate the potential of deep learning-based SR to improve classification accuracy on low-quality 2D echocardiograms. Using the publicly available CAMUS dataset, we stratify samples by image quality and evaluate two clinically relevant tasks of varying complexity: a relatively simple Two-Chamber vs. Four-Chamber (2CH vs. 4CH) view classification and a more complex End-Diastole vs. End-Systole (ED vs. ES) phase classification. We apply two widely used SR models--Super-Resolution Generative Adversarial Network (SRGAN) and Super-Resolution Residual Network (SR-ResNet), to enhance poor-quality images and observe significant gains in performance metric--particularly with SRResNet, which also offers computational efficiency. Our findings demonstrate that SR can effectively recover diagnostic value in degraded echo scans, making it a viable tool for AI-assisted care in RCS, achieving more with less.