An Audio-textual Diffusion Model For Converting Speech Signals Into Ultrasound Tongue Imaging Data

Open in new window