emg2speech: synthesizing speech from electromyography using self-supervised speech models

Gowda, Harshavardhana T., Miller, Lee M.

Oct-29-2025–arXiv.org Artificial Intelligence

We present a neuromuscular speech interface that translates electromyographic (EMG) signals collected from orofacial muscles during speech articulation directly into audio. We show that self-supervised speech (SS) representations exhibit a strong linear relationship with the electrical power of muscle action potentials: SS features can be linearly mapped to EMG power with a correlation of $r = 0.85$. Moreover, EMG power vectors corresponding to different articulatory gestures form structured and separable clusters in feature space. This relationship: $\text{SS features}$ $\xrightarrow{\texttt{linear mapping}}$ $\text{EMG power}$ $\xrightarrow{\texttt{gesture-specific clustering}}$ $\text{articulatory movements}$, highlights that SS models implicitly encode articulatory mechanisms. Leveraging this property, we directly map EMG signals to SS feature space and synthesize speech, enabling end-to-end EMG-to-speech generation without explicit articulatory models and vocoder training.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Oct-29-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > Yolo County > Davis (0.15)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Speech (1.00)
  - Natural Language (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found