AITopics | frame length

Collaborating Authors

frame length

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimising MFCC parameters for the automatic detection of respiratory diseases

Yan, Yuyang, Simons, Sami O., van Bemmel, Loes, Reinders, Lauren, Franssen, Frits M. E., Urovi, Visara

arXiv.org Artificial IntelligenceAug-14-2024

Voice signals originating from the respiratory tract are utilized as valuable acoustic biomarkers for the diagnosis and assessment of respiratory diseases. Among the employed acoustic features, Mel Frequency Cepstral Coefficients (MFCC) is widely used for automatic analysis, with MFCC extraction commonly relying on default parameters. However, no comprehensive study has systematically investigated the impact of MFCC extraction parameters on respiratory disease diagnosis. In this study, we address this gap by examining the effects of key parameters, namely the number of coefficients, frame length, and hop length between frames, on respiratory condition examination. Our investigation uses four datasets: the Cambridge COVID-19 Sound database, the Coswara dataset, the Saarbrucken Voice Disorders (SVD) database, and a TACTICAS dataset. The Support Vector Machine (SVM) is employed as the classifier, given its widespread adoption and efficacy. Our findings indicate that the accuracy of MFCC decreases as hop length increases, and the optimal number of coefficients is observed to be approximately 30. The performance of MFCC varies with frame length across the datasets: for the COVID-19 datasets (Cambridge COVID-19 Sound database and Coswara dataset), performance declines with longer frame lengths, while for the SVD dataset, performance improves with increasing frame length (from 50 ms to 500 ms). Furthermore, we investigate the optimized combination of these parameters and observe substantial enhancements in accuracy. Compared to the worst combination, the SVM model achieves an accuracy of 81.1%, 80.6%, and 71.7%, with improvements of 19.6%, 16.10%, and 14.90% for the Cambridge COVID-19 Sound database, the Coswara dataset, and the SVD dataset respectively.

coefficient, dataset, frame length, (17 more...)

arXiv.org Artificial Intelligence

2408.07522

Country:

Europe > Germany > Saarland > Saarbrücken (0.25)
Europe > Netherlands > Limburg > Maastricht (0.05)
Europe > Spain > Galicia > Madrid (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.89)

Add feedback

Phase-Aware Deep Speech Enhancement: It's All About The Frame Length

Peer, Tal, Gerkmann, Timo

arXiv.org Artificial IntelligenceOct-4-2022

Algorithmic latency in speech processing is dominated by the frame length used for Fourier analysis, which in turn limits the achievable performance of magnitude-centric approaches. As previous studies suggest the importance of phase grows with decreasing frame length, this work presents a systematical study on the contribution of phase and magnitude in modern Deep Neural Network (DNN)-based speech enhancement at different frame lengths. Results indicate that DNNs can successfully estimate phase when using short frames, with similar or better overall performance compared to using longer frames. Thus, interestingly, modern phase-aware DNNs allow for low-latency speech enhancement at high quality.

artificial intelligence, frame length, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1121/10.0014875

2203.16222

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > Germany > Hamburg (0.04)

Genre: Research Report > Experimental Study (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Non-Autoregressive Sign Language Production via Knowledge Distillation

Hwang, Eui Jun, Kim, Jung Ho, Cho, Suk Min, Park, Jong C.

arXiv.org Artificial IntelligenceAug-12-2022

Sign Language Production (SLP) aims to translate expressions in spoken language into corresponding ones in sign language, such as skeleton-based sign poses or videos. Existing SLP models are either AutoRegressive (AR) or Non-Autoregressive (NAR). However, AR-SLP models suffer from regression to the mean and error propagation during decoding. NSLP-G, a NAR-based model, resolves these issues to some extent but engenders other problems. For example, it does not consider target sign lengths and suffers from false decoding initiation. We propose a novel NAR-SLP model via Knowledge Distillation (KD) to address these problems. First, we devise a length regulator to predict the end of the generated sign pose sequence. We then adopt KD, which distills spatial-linguistic features from a pre-trained pose encoder to alleviate false decoding initiation. Extensive experiments show that the proposed approach significantly outperforms existing SLP models in both Frechet Gesture Distance and Back-Translation evaluation.

language production, sign language production, sign pose, (13 more...)

arXiv.org Artificial Intelligence

2208.06183

Country:

North America > United States (0.04)
North America > Dominican Republic (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)

Genre:

Research Report (0.82)
Instructional Material > Course Syllabus & Notes (0.46)

Industry: Education > Curriculum > Subject-Specific Education (0.85)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback