AITopics | glottal source waveform

Collaborating Authors

glottal source waveform

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Modeling and Estimation of Vocal Tract and Glottal Source Parameters Using ARMAX-LF Model

Lia, Kai, Akagia, Masato, Lib, Yongwei, Unokia, Masashi

arXiv.org Artificial IntelligenceOct-6-2024

Modeling and estimation of the vocal tract and glottal source parameters of vowels from raw speech can be typically done by using the Auto-Regressive with eXogenous input (ARX) model and Liljencrants-Fant (LF) model with an iteration-based estimation approach. However, the all-pole autoregressive model in the modeling of vocal tract filters cannot provide the locations of anti-formants (zeros), which increases the estimation errors in certain classes of speech sounds, such as nasal, fricative, and stop consonants. In this paper, we propose the Auto-Regressive Moving Average eXogenous with LF (ARMAX-LF) model to extend the ARX-LF model to a wider variety of speech sounds, including vowels and nasalized consonants. The LF model represents the glottal source derivative as a parametrized time-domain model, and the ARMAX model represents the vocal tract as a pole-zero filter with an additional exogenous LF excitation as input. To estimate multiple parameters with fewer errors, we first utilize the powerful nonlinear fitting ability of deep neural networks (DNNs) to build a mapping from extracted glottal source derivatives or speech waveforms to corresponding LF parameters. Then, glottal source and vocal tract parameters can be estimated with fewer estimation errors and without any iterations as in the analysis-by-synthesis strategy. Experimental results with synthesized speech using the linear source-filter model, synthesized speech using the physical model, and real speech signals showed that the proposed ARMAX-LF model with a DNN-based estimation method can estimate the parameters of both vowels and nasalized sounds with fewer errors and estimation time.

armax-lf model, glottal source, speech, (9 more...)

arXiv.org Artificial Intelligence

2410.04704

Country:

Asia > Japan (0.04)
Asia > India (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Analysis and Detection of Pathological Voice using Glottal Source Features

Kadiri, Sudarsana Reddy, Alku, Paavo

arXiv.org Artificial IntelligenceOct-17-2023

Automatic detection of voice pathology enables objective assessment and earlier intervention for the diagnosis. This study provides a systematic analysis of glottal source features and investigates their effectiveness in voice pathology detection. Glottal source features are extracted using glottal flows estimated with the quasi-closed phase (QCP) glottal inverse filtering method, using approximate glottal source signals computed with the zero frequency filtering (ZFF) method, and using acoustic voice signals directly. In addition, we propose to derive mel-frequency cepstral coefficients (MFCCs) from the glottal source waveforms computed by QCP and ZFF to effectively capture the variations in glottal source spectra of pathological voice. Experiments were carried out using two databases, the Hospital Universitario Principe de Asturias (HUPA) database and the Saarbrucken Voice Disorders (SVD) database. Analysis of features revealed that the glottal source contains information that discriminates normal and pathological voice. Pathology detection experiments were carried out using support vector machine (SVM). From the detection experiments it was observed that the performance achieved with the studied glottal source features is comparable or better than that of conventional MFCCs and perceptual linear prediction (PLP) features. The best detection performance was achieved when the glottal source features were combined with the conventional MFCCs and PLP features, which indicates the complementary nature of the features.

detection, glottal source feature, pathological voice, (10 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/JSTSP.2019.2957988

2309.1408

Country:

Europe > Germany > Saarland > Saarbrücken (0.24)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States (0.04)
(5 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.66)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)

Add feedback

Investigation of Self-supervised Pre-trained Models for Classification of Voice Quality from Speech and Neck Surface Accelerometer Signals

Kadiri, Sudarsana Reddy, Javanmardi, Farhad, Alku, Paavo

arXiv.org Artificial IntelligenceAug-6-2023

Prior studies in the automatic classification of voice quality have mainly studied the use of the acoustic speech signal as input. Recently, a few studies have been carried out by jointly using both speech and neck surface accelerometer (NSA) signals as inputs, and by extracting MFCCs and glottal source features. This study examines simultaneously-recorded speech and NSA signals in the classification of voice quality (breathy, modal, and pressed) using features derived from three self-supervised pre-trained models (wav2vec2-BASE, wav2vec2-LARGE, and HuBERT) and using a SVM as well as CNNs as classifiers. Furthermore, the effectiveness of the pre-trained models is compared in feature extraction between glottal source waveforms and raw signal waveforms for both speech and NSA inputs. Using two signal processing methods (quasi-closed phase (QCP) glottal inverse filtering and zero frequency filtering (ZFF)), glottal source waveforms are estimated from both speech and NSA signals. The study has three main goals: (1) to study whether features derived from pre-trained models improve classification accuracy compared to conventional features (spectrogram, mel-spectrogram, MFCCs, i-vector, and x-vector), (2) to investigate which of the two modalities (speech vs. NSA) is more effective in the classification task with pre-trained model-based features, and (3) to evaluate whether the deep learning-based CNN classifier can enhance the classification accuracy in comparison to the SVM classifier. The results revealed that the use of the NSA input showed better classification performance compared to the speech signal. Between the features, the pre-trained model-based features showed better classification accuracies, both for speech and NSA inputs compared to the conventional features. It was also found that the HuBERT features performed better than the wav2vec2-BASE and wav2vec2-LARGE features.

artificial intelligence, glottal source waveform, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.csl.2023.101550

2308.03226

Country:

Europe > Finland (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > India (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine (0.93)
Government > Regional Government (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.75)

Add feedback