AITopics | acoustical society

Collaborating Authors

acoustical society

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Omni-Expert: AComputationally Efficient Approach to Achieve a Mixture of Experts in a Single Expert Model

Neural Information Processing SystemsJun-22-2026, 02:54:38 GMT

Mixture-of-Experts (MoE) models have become popular in machine learning, boosting performance by partitioning tasks across multiple experts. However, the need for several experts often results in high computational costs, limiting their application on resource-constrained devices with stringent real-time requirements, such as cochlear implants (CIs). We introduce the Omni-Expert (OE) - a simple and efficient solution that leverages feature transformations to achieve the'divideand-conquer' functionality of a full MoE ensemble in a single expert model. We demonstrate the effectiveness of the OE using phoneme-specific time-frequency masking for speech dereverberation in a CI. Empirical results show that the OE delivers statistically significant improvements in objective intelligibility measures of CI vocoded speech at different levels of reverberation across various speech datasets at a much reduced computational cost relative to a counterpart MoE.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: North America > United States (0.92)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Health Care Technology (0.46)
Health & Medicine > Consumer Health (0.36)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Caterpillars use tiny hairs to hear

Popular ScienceFeb-1-2026, 17:08:00 GMT

Experiment in one of the world's quietest rooms reveals the hairs detect airborne sounds--like predators. Breakthroughs, discoveries, and DIY tips sent six days a week. Have you ever walked into a room full of caterpillars? While the answer for most people is probably no, those of us who have may have noticed the insects reacting to the sound of your voice. That's what happened to Carol Miles, a biologist at Binghamton University in New York.

artificial intelligence, caterpillar, caterpillar use tiny hair, (10 more...)

Popular Science

Country: North America > United States > New York > Broome County > Binghamton (0.26)

Technology: Information Technology > Artificial Intelligence (0.70)

Add feedback

Perch 2.0 transfers 'whale' to underwater tasks

Burns, Andrea, Harrell, Lauren, van Merriënboer, Bart, Dumoulin, Vincent, Hamer, Jenny, Denton, Tom

arXiv.org Artificial IntelligenceDec-4-2025

Perch 2.0 is a supervised bioacoustics foundation model pretrained on 14,597 species, including birds, mammals, amphibians, and insects, and has state-of-the-art performance on multiple benchmarks. Given that Perch 2.0 includes almost no marine mammal audio or classes in the training data, we evaluate Perch 2.0 performance on marine mammal and underwater audio tasks through few-shot transfer learning. We perform linear probing with the embeddings generated from this foundation model and compare performance to other pretrained bioacoustics models. In particular, we compare Perch 2.0 with previous multispecies whale, Perch 1.0, SurfPerch, AVES-bio, BirdAVES, and Birdnet V2.3 models, which have open-source tools for transfer-learning and agile modeling. We show that the embeddings from the Perch 2.0 model have consistently high performance for few-shot transfer learning, generally outperforming alternative embedding models on the majority of tasks, and thus is recommended when developing new linear classifiers for marine mammal classification with few labeled examples.

artificial intelligence, machine learning, perch 2, (18 more...)

arXiv.org Artificial Intelligence

2512.03219

Country: Pacific Ocean (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Quieter dental drills may be on the horizon

Popular ScienceDec-2-2025, 18:20:00 GMT

The high-pitched whine of dentistry tools creates a lot of anxiety, especially for kids. The fear of going to the dentist is called odontophobia. Breakthroughs, discoveries, and DIY tips sent every weekday. If the thought of going to the dentist makes your teeth chatter with fear, you're not alone. At least 15 to 20 percent of adults are believed to have odontophobia--aka dental anxiety--which prevents them from maintaining regular cleanings and dental check-ups .

artificial intelligence, dental drill, quieter dental drill, (16 more...)

Popular Science

Country:

Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.06)
North America > United States > Hawaii > Honolulu County > Honolulu (0.05)
Asia > Middle East > Republic of Türkiye (0.05)

Genre: Research Report > New Finding (0.36)

Industry: Health & Medicine > Therapeutic Area > Dental and Oral Health (1.00)

Technology: Information Technology > Artificial Intelligence (0.53)

Add feedback

Advancing Marine Bioacoustics with Deep Generative Models: A Hybrid Augmentation Strategy for Southern Resident Killer Whale Detection

Padovese, Bruno, Frazao, Fabio, Dowd, Michael, Joy, Ruth

arXiv.org Artificial IntelligenceDec-1-2025

Automated detection and classification of marine mammals vocalizations is critical for conservation and management efforts but is hindered by limited annotated datasets and the acoustic complexity of real-world marine environments. Data augmentation has proven to be an effective strategy to address this limitation by increasing dataset diversity and improving model generalization without requiring additional field data. However, most augmentation techniques used to date rely on effective but relatively simple transformations, leaving open the question of whether deep generative models can provide additional benefits. In this study, we evaluate the potential of deep generative for data augmentation in marine mammal call detection including: Variational Autoencoders, Generative Adversarial Networks, and Denoising Diffusion Probabilistic Models. Using Southern Resident Killer Whale (Orcinus orca) vocalizations from two long-term hydrophone deployments in the Salish Sea, we compare these approaches against traditional augmentation methods such as time-shifting and vocalization masking. While all generative approaches improved classification performance relative to the baseline, diffusion-based augmentation yielded the highest recall (0.87) and overall F1-score (0.75). A hybrid strategy combining generative-based synthesis with traditional methods achieved the best overall performance with an F1-score of 0.81. We hope this study encourages further exploration of deep generative models as complementary augmentation strategies to advance acoustic monitoring of threatened marine mammal populations.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2511.21872

Country:

North America > United States (0.67)
Europe (0.67)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.80)

Add feedback

DYNARTmo: A Dynamic Articulatory Model for Visualization of Speech Movement Patterns

Kröger, Bernd J.

arXiv.org Artificial IntelligenceNov-7-2025

We present DYNARTmo, a dynamic articulatory model designed to visualize speech articulation processes in a two-dimensional midsagittal plane. The model builds upon the UK-DYNAMO framework and integrates principles of articulatory underspecification, segmental and gestural control, and coarticulation. DYNARTmo simulates six key articulators based on ten continuous and six discrete control parameters, allowing for the generation of both vocalic and consonantal articulatory configurations. The current implementation is embedded in a web-based application (SpeechArticulationTrainer) that includes sagittal, glottal, and palatal views, making it suitable for use in phonetics education and speech therapy. While this paper focuses on the static modeling aspects, future work will address dynamic movement generation and integration with articulatory-acoustic modules.

artificial intelligence, dynartmo, kr oger, (16 more...)

arXiv.org Artificial Intelligence

2507.20343

Country:

Europe > Germany (0.28)
North America > United States (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Speech (0.70)

Add feedback

HergNet: a Fast Neural Surrogate Model for Sound Field Predictions via Superposition of Plane Waves

Calafà, Matteo, Xia, Yuanxin, Jeong, Cheol-Ho

arXiv.org Artificial IntelligenceOct-29-2025

ABSTRACT We present a novel neural network architecture for the efficient prediction of sound fields in two and three dimensions. The network is designed to automatically satisfy the Helmholtz equation, ensuring that the outputs are physically valid. Therefore, the method can effectively learn solutions to boundary-value problems in various wave phenomena, such as acoustics, optics, and electromagnetism. Numerical experiments show that the proposed strategy can potentially outperform state-of-the-art methods in room acoustics simulation, in particular in the range of mid to high frequencies. Index T erms-- Helmholtz equation, wave fields, room acoustics, physics-informed neural networks 1. INTRODUCTION Several physical phenomena are represented by propagation of waves, especially in fields like acoustics, optics, quantum mechanics, electromagnetism and surface fluid mechanics [1, 2, 3, 4, 5]. Fast and accurate simulations of waves dynamics is therefore of great relevance to the scientific community, in particular in complex scenarios, where high frequencies, broad domains or long time intervals are considered.

artificial intelligence, machine learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2510.24279

Country: Europe > Denmark > Capital Region > Kongens Lyngby (0.14)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Translation-Equivariant Self-Supervised Learning for Pitch Estimation with Optimal Transport

Torres, Bernardo, Riou, Alain, Richard, Gaël, Peeters, Geoffroy

arXiv.org Artificial IntelligenceOct-28-2025

ABSTRACT In this paper, we propose an Optimal Transport objective for learning one-dimensional translation-equivariant systems and demonstrate its applicability to single pitch estimation. Our method provides a theoretically grounded, more numerically stable, and simpler alternative for training state-of-the-art self-supervised pitch estimators. 1. INTRODUCTION Pitch estimation is a core task in audio analysis, long studied in the speech and Music Information Retrieval (MIR) communities [1]. It involves estimating the fundamental frequency of harmonic or quasi-harmonic signals, with traditional methods relying on signal processing techniques to extract harmonicity cues [2-4], or by matching the input spectrum to that of a synthetic waveform [5]. Recently, supervised deep learning approaches leveraging large annotated datasets (such as CREPE [6]) have achieved impressive accuracy, but come with notable challenges. In particular, labeling audio with the temporal precision needed for training (typically within a few milliseconds) is labor-intensive and prone to errors.

artificial intelligence, inductive learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2508.01493

Country:

Europe > France (0.14)
Asia > South Korea (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.42)

Add feedback

The Tonogenesis Continuum in Tibetan: A Computational Investigation

Liang, Siyu, Zerong, Zhaxi

arXiv.org Artificial IntelligenceOct-28-2025

Tonogenesis-the historical process by which segmental contrasts evolve into lexical tone-has traditionally been studied through comparative reconstruction and acoustic phonetics. We introduce a computational approach that quantifies the functional role of pitch at different stages of this sound change by measuring how pitch manipulation affects automatic speech recognition (ASR) performance. Through analysis on the sensitivity to pitch-flattening from a set of closely related Tibetan languages, we find evidence of a tonogenesis continuum: atonal Amdo dialects tolerate pitch removal the most, while fully tonal U-Tsang varieties show severe degradation, and intermediate Kham dialects fall measurably between these extremes. These gradient effects demonstrate how ASR models implicitly learn the shifting functional load of pitch as languages transition from consonant-based to tone-based lexical contrasts. Our findings show that computational methods can capture fine-grained stages of sound change and suggest that traditional functional load metrics, based solely on minimal pairs, may overestimate pitch dependence in transitional systems where segmental and suprasegmental cues remain phonetically intertwined.

artificial intelligence, dialect, speech recognition, (16 more...)

arXiv.org Artificial Intelligence

2510.22485

Country: