UltraUNet: Real-Time Ultrasound Tongue Segmentation for Diverse Linguistic and Imaging Conditions

Myrgyyassov, Alisher, Song, Zhen, Sun, Yu, Wang, Bruce Xiao, Wong, Min Ney, Zheng, Yongping

arXiv.org Artificial Intelligence 

Ultrasound tongue imaging (UTI) is a non-invasive and cost-effective tool for studying speech articulation, motor control, and related disorders. However, real-time tongue contour segmentation remains challenging due to low signal-to-noise ratios, imaging variability, and computational demands. We propose UltraUNet, a lightweight encoder-decoder architecture optimized for real-time segmentation of tongue contours in ultrasound images. UltraUNet incorporates domain-specific innovations such as lightweight Squeeze-and-Excitation blocks, Group Normalization for small-batch stability, and summation-based skip connections to reduce memory and computational overhead. It achieves 250 frames per second and integrates ultrasound-specific augmentations like denoising and blur simulation. Evaluations on 8 datasets demonstrate high accuracy and robustness, with single-dataset Dice = 0.855 and MSD = 0.993px, and cross-dataset Dice averaging 0.734 and 0.761. UltraUNet provides a fast, accurate solution for speech research, clinical diagnostics, and analysis of speech motor disorders.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found