AITopics | formant

Collaborating Authors

formant

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

87682805257e619d49b8e0dfdc14affa-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 17:25:15 GMT

channel size, instruction, total amount, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.49)

Add feedback

87682805257e619d49b8e0dfdc14affa-Supplemental.pdf

Neural Information Processing SystemsAug-15-2025, 16:33:55 GMT

channel size, instruction, total amount, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.49)

Add feedback

Echoes of Phonetics: Unveiling Relevant Acoustic Cues for ASR via Feature Attribution

Fucci, Dennis, Gaido, Marco, Negri, Matteo, Cettolo, Mauro, Bentivogli, Luisa

arXiv.org Artificial IntelligenceJun-4-2025

Despite significant advances in ASR, the specific acoustic cues models rely on remain unclear. Prior studies have examined such cues on a limited set of phonemes and outdated models. In this work, we apply a feature attribution technique to identify the relevant acoustic cues for a modern Conformer-based ASR system. By analyzing plosives, fricatives, and vowels, we assess how feature attributions align with their acoustic properties in the time and frequency domains, also essential for human speech perception. Our findings show that the ASR model relies on vowels' full time spans, particularly their first two formants, with greater saliency in male speech. It also better captures the spectral characteristics of sibilant fricatives than non-sibilants and prioritizes the release phase in plosives, especially burst characteristics. These insights enhance the interpretability of ASR models and highlight areas for future research to uncover potential gaps in model robustness.

machine learning, natural language, vowel, (19 more...)

arXiv.org Artificial Intelligence

2506.02181

Country: Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.71)

Add feedback

Parsing Through Boundaries in Chinese Word Segmentation

Chen, Yige, Li, Zelong, Yang, Changbing, Zhang, Cindy, Cady, Amandisa, Lee, Ai Ka, Zeng, Zejiao, Pan, Haihua, Park, Jungyeul

arXiv.org Artificial IntelligenceMar-29-2025

Chinese word segmentation is a foundational task in natural language processing (NLP), with far-reaching effects on syntactic analysis. Unlike alphabetic languages like English, Chinese lacks explicit word boundaries, making segmentation both necessary and inherently ambiguous. This study highlights the intricate relationship between word segmentation and syntactic parsing, providing a clearer understanding of how different segmentation strategies shape dependency structures in Chinese. Focusing on the Chinese GSD treebank, we analyze multiple word boundary schemes, each reflecting distinct linguistic and computational assumptions, and examine how they influence the resulting syntactic structures. To support detailed comparison, we introduce an interactive web-based visualization tool that displays parsing outcomes across segmentation methods.

artificial intelligence, compound, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.23091

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
Asia > China > Hong Kong (0.04)
North America > United States > Maryland > Baltimore (0.04)
(8 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Towards efficient keyword spotting using spike-based time difference encoders

Pequeño-Zurro, Alejandro, Khacef, Lyes, Panzeri, Stefano, Chicca, Elisabetta

arXiv.org Artificial IntelligenceMar-19-2025

Keyword spotting in edge devices is becoming increasingly important as voice-activated assistants are widely used. However, its deployment is often limited by the extreme low-power constraints of the target embedded systems. Here, we explore the Temporal Difference Encoder (TDE) performance in keyword spotting. This recent neuron model encodes the time difference in instantaneous frequency and spike count to perform efficient keyword spotting with neuromorphic processors. We use the TIdigits dataset of spoken digits with a formant decomposition and rate-based encoding into spikes. We compare three Spiking Neural Networks (SNNs) architectures to learn and classify spatio-temporal signals. The proposed SNN architectures are made of three layers with variation in its hidden layer composed of either (1) feedforward TDE, (2) feedforward Current-Based Leaky Integrate-and-Fire (CuBa-LIF), or (3) recurrent CuBa-LIF neurons. We first show that the spike trains of the frequency-converted spoken digits have a large amount of information in the temporal domain, reinforcing the importance of better exploiting temporal encoding for such a task. We then train the three SNNs with the same number of synaptic weights to quantify and compare their performance based on the accuracy and synaptic operations. The resulting accuracy of the feedforward TDE network (89%) is higher than the feedforward CuBa-LIF network (71%) and close to the recurrent CuBa-LIF network (91%). However, the feedforward TDE-based network performs 92% fewer synaptic operations than the recurrent CuBa-LIF network with the same amount of synapses. In addition, the results of the TDE network are highly interpretable and correlated with the frequency and timescale features of the spoken keywords in the dataset. Our findings suggest that the TDE is a promising neuron model for scalable event-driven processing of spatio-temporal patterns.

architecture, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.15402

Country:

North America > Cuba (1.00)
North America > United States (0.28)
Europe > Netherlands (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Improving Voice Quality in Speech Anonymization With Just Perception-Informed Losses

Ghosh, Suhita, Thiele, Tim, Lorbeer, Frederic, Dreyer, Frank, Stober, Sebastian

arXiv.org Artificial IntelligenceOct-20-2024

The increasing use of cloud-based speech assistants has heightened the need for effective speech anonymization, which aims to obscure a speaker's identity while retaining critical information for subsequent tasks. One approach to achieving this is through voice conversion. While existing methods often emphasize complex architectures and training techniques, our research underscores the importance of loss functions inspired by the human auditory system. Our proposed loss functions are model-agnostic, incorporating handcrafted and deep learning-based features to effectively capture quality representations. Through objective and subjective evaluations, we demonstrate that a VQVAE-based model, enhanced with our perception-driven losses, surpasses the vanilla model in terms of naturalness, intelligibility, and prosody while maintaining speaker anonymity. These improvements are consistently observed across various datasets, languages, target speakers, and genders.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.15499

Country: Europe > Germany > Saxony-Anhalt > Magdeburg (0.04)

Genre: Research Report (0.83)

Industry:

Health & Medicine (0.69)
Information Technology (0.66)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Explaining Spectrograms in Machine Learning: A Study on Neural Networks for Speech Classification

James, Jesin, T., Balamurali B., Abeysinghe, Binu, Liu, Junchen

arXiv.org Artificial IntelligenceJul-10-2024

This study investigates discriminative patterns learned by neural networks for accurate speech classification, with a specific focus on vowel classification tasks. By examining the activations and features of neural networks for vowel classification, we gain insights into what the networks "see" in spectrograms. Through the use of class activation mapping, we identify the frequencies that contribute to vowel classification and compare these findings with linguistic knowledge. Experiments on a American English dataset of vowels showcases the explainability of neural networks and provides valuable insights into the causes of misclassifications and their characteristics when differentiating them from unvoiced speech. This study not only enhances our understanding of the underlying acoustic cues in vowel classification but also offers opportunities for improving speech recognition by bridging the gap between abstract representations in neural networks and established linguistic knowledge.

frequency, recognition, spectrogram, (14 more...)

arXiv.org Artificial Intelligence

2407.17416

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
Asia > Singapore (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > India (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evolution of Voices in French Audiovisual Media Across Genders and Age in a Diachronic Perspective

Rilliard, Albert, Doukhan, David, Uro, Rémi, Devauchelle, Simon

arXiv.org Artificial IntelligenceApr-24-2024

We present a diachronic acoustic analysis of the voice of 1023 speakers from French media archives. The speakers are spread across 32 categories based on four periods (years 1955/56, 1975/76, 1995/96, 2015/16), four age groups (20-35; 36-50; 51-65, >65), and two genders. The fundamental frequency ($F_0$) and the first four formants (F1-4) were estimated. Procedures used to ensure the quality of these estimations on heterogeneous data are described. From each speaker's $F_0$ distribution, the base-$F_0$ value was calculated to estimate the register. Average vocal tract length was estimated from formant frequencies. Base-$F_0$ and vocal tract length were fit by linear mixed models to evaluate how they may have changed across time periods and genders, corrected for age effects. Results show an effect of the period with a tendency to lower voices, independently of gender. A lowering of pitch is observed with age for female but not male speakers.

estimation, gender, time period, (14 more...)

arXiv.org Artificial Intelligence

2404.16104

Country:

South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
Oceania > Australia (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Comparison of parameters of vowel sounds of russian and english languages

Fedoseev, V. I., Konev, A. A., Yakimuk, A. Yu.

arXiv.org Artificial IntelligenceJan-26-2024

In multilingual speech recognition systems, a situation can often arise when the language is not known in advance, but the signal has already been received and is being processed. For such cases, some generalized model is needed that will be able to respond to phonetic differences and, depending on them, correctly recog-nize speech in the desired language. To build such a model, it is necessary to set the values of phonetic parameters, and then compare similar sounds, establishing significant differences.

clear resemblance, frequency, main tone, (14 more...)

arXiv.org Artificial Intelligence

2401.1489

Country:

Asia > Russia > Siberian Federal District > Tomsk Oblast > Tomsk (0.06)
Europe > Russia (0.05)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)

Add feedback

No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition through Pitch Manipulation

Fucci, Dennis, Gaido, Marco, Negri, Matteo, Cettolo, Mauro, Bentivogli, Luisa

arXiv.org Artificial IntelligenceOct-10-2023

Automatic speech recognition (ASR) systems are known to be sensitive to the sociolinguistic variability of speech data, in which gender plays a crucial role. This can result in disparities in recognition accuracy between male and female speakers, primarily due to the under-representation of the latter group in the training data. While in the context of hybrid ASR models several solutions have been proposed, the gender bias issue has not been explicitly addressed in end-to-end neural architectures. To fill this gap, we propose a data augmentation technique that manipulates the fundamental frequency (f0) and formants. This technique reduces the data unbalance among genders by simulating voices of the under-represented female speakers and increases the variability within each gender group. Experiments on spontaneous English speech show that our technique yields a relative WER improvement up to 9.87% for utterances by female speakers, with larger gains for the least-represented f0 ranges.

gender, recognition, speech recognition, (16 more...)

arXiv.org Artificial Intelligence

2310.0659

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Northern Europe (0.04)
Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback