UltraUNet: Real-Time Ultrasound Tongue Segmentation for Diverse Linguistic and Imaging Conditions
Myrgyyassov, Alisher, Song, Zhen, Sun, Yu, Wang, Bruce Xiao, Wong, Min Ney, Zheng, Yongping
–arXiv.org Artificial Intelligence
Ultrasound tongue imaging (UTI) is a non-invasive and cost-effective tool for studying speech articulation, motor control, and related disorders. However, real-time tongue contour segmentation remains challenging due to low signal-to-noise ratios, imaging variability, and computational demands. We propose UltraUNet, a lightweight encoder-decoder architecture optimized for real-time segmentation of tongue contours in ultrasound images. UltraUNet incorporates domain-specific innovations such as lightweight Squeeze-and-Excitation blocks, Group Normalization for small-batch stability, and summation-based skip connections to reduce memory and computational overhead. It achieves 250 frames per second and integrates ultrasound-specific augmentations like denoising and blur simulation. Evaluations on 8 datasets demonstrate high accuracy and robustness, with single-dataset Dice = 0.855 and MSD = 0.993px, and cross-dataset Dice averaging 0.734 and 0.761. UltraUNet provides a fast, accurate solution for speech research, clinical diagnostics, and analysis of speech motor disorders.
arXiv.org Artificial Intelligence
Sep-30-2025
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Health & Medicine > Diagnostic Medicine > Imaging (0.68)
- Technology: