AITopics | speech inversion

Collaborating Authors

speech inversion

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Articulation-Informed ASR: Integrating Articulatory Features into ASR via Auxiliary Speech Inversion and Cross-Attention Fusion

Attia, Ahmed Adel, Liu, Jing, Wilson, Carol Espy

arXiv.org Artificial IntelligenceOct-13-2025

ABSTRACT Prior works have investigated the use of articulatory features as complementary representations for automatic speech recognition (ASR), but their use was largely confined to shallow acoustic models. In this work, we revisit articulatory information in the era of deep learning and propose a framework that leverages articulatory representations both as an auxiliary task and as a pseudo-input to the recognition model. Specifically, we employ speech inversion as an auxiliary prediction task, and the predicted articulatory features are injected into the model as a query stream in a cross-attention module with acoustic embeddings as keys and values. Experiments on LibriSpeech demonstrate that our approach yields consistent improvements over strong transformer-based baselines, particularly under low-resource conditions. These findings suggest that articulatory features, once sidelined in ASR research, can provide meaningful benefits when reintroduced with modern architectures.

artificial intelligence, machine learning, representation, (15 more...)

arXiv.org Artificial Intelligence

2510.08585

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Speaker- and Text-Independent Estimation of Articulatory Movements and Phoneme Alignments from Speech

Weise, Tobias, Klumpp, Philipp, Demir, Kubilay Can, Pérez-Toro, Paula Andrea, Schuster, Maria, Noeth, Elmar, Heismann, Bjoern, Maier, Andreas, Yang, Seung Hee

arXiv.org Artificial IntelligenceJul-3-2024

This paper introduces a novel combination of two tasks, previously treated separately: acoustic-to-articulatory speech inversion (AAI) and phoneme-to-articulatory (PTA) motion estimation. We refer to this joint task as acoustic phoneme-to-articulatory speech inversion (APTAI) and explore two different approaches, both working speaker- and text-independently during inference. We use a multi-task learning setup, with the end-to-end goal of taking raw speech as input and estimating the corresponding articulatory movements, phoneme sequence, and phoneme alignment. While both proposed approaches share these same requirements, they differ in their way of achieving phoneme-related predictions: one is based on frame classification, the other on a two-staged training procedure and forced alignment. We reach competitive performance of 0.73 mean correlation for the AAI task and achieve up to approximately 87% frame overlap compared to a state-of-the-art text-dependent phoneme force aligner.

alignment, phoneme sequence, speech inversion, (10 more...)

arXiv.org Artificial Intelligence

2407.03132

Country:

Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)
South America > Colombia > Antioquia Department > Medellín (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Improving Speech Inversion Through Self-Supervised Embeddings and Enhanced Tract Variables

Attia, Ahmed Adel, Siriwardena, Yashish M., Espy-Wilson, Carol

arXiv.org Artificial IntelligenceSep-17-2023

The performance of deep learning models depends significantly on their capacity to encode input features efficiently and decode them into meaningful outputs. Better input and output representation has the potential to boost models' performance and generalization. In the context of acoustic-to-articulatory speech inversion (SI) systems, we study the impact of utilizing speech representations acquired via self-supervised learning (SSL) models, such as HuBERT compared to conventional acoustic features. Additionally, we investigate the incorporation of novel tract variables (TVs) through an improved geometric transformation model. By combining these two approaches, we improve the Pearson product-moment correlation (PPMC) scores which evaluate the accuracy of TV estimation of the SI system from 0.7452 to 0.8141, a 6.9% increase. Our findings underscore the profound influence of rich feature representations from SSL models and improved geometric transformations with target TVs on the enhanced functionality of SI systems.

dataset, representation, transformation, (16 more...)

arXiv.org Artificial Intelligence

2309.0922

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
North America > United States > Wisconsin (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.54)

Industry: Health & Medicine (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback