AITopics | vocalization

Collaborating Authors

vocalization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The 'Waymo of the sea' tracks sperm whale conversations

The'Waymo of the sea' tracks sperm whale conversations More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. The Project CETI glider can autonomously follow sperm whale vocalizations. Breakthroughs, discoveries, and DIY tips sent six days a week. Sperm whales () go deep. They can dive 1,300 to 4,000 feet-deep and also travel as much as 15,000 miles per year.

artificial intelligence, glider, whale, (12 more...)

Popular Science

Country: North America > United States > California (0.15)

Genre: Research Report > New Finding (0.50)

Industry:

Transportation > Passenger (0.84)
Transportation > Air (0.84)

Technology: Information Technology > Artificial Intelligence > Robots (0.31)

Add feedback

Vocal Call Locator Benchmark (VCL) for localizing rodent vocalizations from multi-channel audio

Neural Information Processing SystemsMar-22-2026, 08:26:01 GMT

Understanding the behavioral and neural dynamics of social interactions is a goalof contemporary neuroscience. Many machine learning methods have emergedin recent years to make sense of complex video and neurophysiological data thatresult from these experiments. Less focus has been placed on understanding howanimals process acoustic information, including social vocalizations. A criticalstep to bridge this gap is determining the senders and receivers of acoustic infor-mation in social interactions. While sound source localization (SSL) is a classicproblem in signal processing, existing approaches are limited in their ability tolocalize animal-generated sounds in standard laboratory environments.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Neurology (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Checklist 1. For all authors (a)

Neural Information Processing SystemsFeb-17-2026, 22:20:46 GMT

We explored whether SSL performance systematically varied as a function of reverberance using acoustic simulations.

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry:

Government (0.94)
Information Technology (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Vocal Call Locator Benchmark (VCL) for localizing rodent vocalizations from multi-channel audio Ralph E Peterson

Neural Information Processing SystemsFeb-17-2026, 22:20:44 GMT

Here, we present the VCL Benchmark: the first large-scale dataset for benchmarking SSL algorithms in rodents.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Asia > Middle East > Iran (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Speech (0.68)

Add feedback

Is the Rat War Over?

The New YorkerFeb-12-2026, 11:00:00 GMT

Is the Rat War Over? In New York, a rat czar and new methods have brought down complaints. We may even be ready to appreciate the creatures. Rats were leaving Manhattan, hurrying across the bridges in single-file lines. Some went to Westchester, some to Brooklyn. It was the pandemic, and the rats, which had been living off the nourishing trash of New York's densest borough for generations, were as panicked about the closure of restaurants as we were. People were eating three meals a day at home, and the rats were hungry. At least that was the story going around.

artificial intelligence, corrigan, peterson, (15 more...)

The New Yorker

Country:

South America (1.00)
Asia (0.94)
North America > United States > New York (0.47)

Industry:

Government > Regional Government (0.94)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology: Information Technology > Artificial Intelligence (0.46)

Add feedback

Crossing the Species Divide: Transfer Learning from Speech to Animal Sounds

Cauzinille, Jules, Miron, Marius, Pietquin, Olivier, Hagiwara, Masato, Marxer, Ricard, Rey, Arnaud, Favre, Benoit

arXiv.org Artificial IntelligenceDec-10-2025

Self-supervised speech models have demonstrated impressive performance in speech processing, but their effectiveness on non-speech data remains underexplored. We study the transfer learning capabilities of such models on bioacoustic detection and classification tasks. We show that models such as HuBERT, WavLM, and XEUS can generate rich latent representations of animal sounds across taxa. We analyze the models properties with linear probing on time-averaged representations. We then extend the approach to account for the effect of time-wise information with other downstream architectures. Finally, we study the implication of frequency range and noise on performance. Notably, our results are competitive with fine-tuned bioacoustic pre-trained models and show the impact of noise-robust pre-training setups. These findings highlight the potential of speech-based self-supervised learning as an efficient framework for advancing bioacoustic research.

artificial intelligence, machine learning, representation, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.5281/zenodo.17251589

2509.04166

Country: Europe > France (0.15)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.71)

Add feedback

WhAM: Towards A Translative Model of Sperm Whale Vocalization

Paradise, Orr, Muralikrishnan, Pranav, Chen, Liangyuan, García, Hugo Flores, Pardo, Bryan, Diamant, Roee, Gruber, David F., Gero, Shane, Goldwasser, Shafi

arXiv.org Artificial IntelligenceDec-3-2025

Sperm whales communicate in short sequences of clicks known as codas. We present WhAM (Whale Acoustics Model), the first transformer-based model capable of generating synthetic sperm whale codas from any audio prompt. WhAM is built by finetuning VampNet, a masked acoustic token model pretrained on musical audio, using 10k coda recordings collected over the past two decades. Through iterative masked token prediction, WhAM generates high-fidelity synthetic codas that preserve key acoustic features of the source recordings. We evaluate WhAM's synthetic codas using Fréchet Audio Distance and through perceptual studies with expert marine biologists. On downstream classification tasks including rhythm, social unit, and vowel classification, WhAM's learned representations achieve strong performance, despite being trained for generation rather than classification. Our code is available at https://github.com/Project-CETI/wham

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.02206

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.93)
Health & Medicine (0.67)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Advancing Marine Bioacoustics with Deep Generative Models: A Hybrid Augmentation Strategy for Southern Resident Killer Whale Detection

Padovese, Bruno, Frazao, Fabio, Dowd, Michael, Joy, Ruth

arXiv.org Artificial IntelligenceDec-1-2025

Automated detection and classification of marine mammals vocalizations is critical for conservation and management efforts but is hindered by limited annotated datasets and the acoustic complexity of real-world marine environments. Data augmentation has proven to be an effective strategy to address this limitation by increasing dataset diversity and improving model generalization without requiring additional field data. However, most augmentation techniques used to date rely on effective but relatively simple transformations, leaving open the question of whether deep generative models can provide additional benefits. In this study, we evaluate the potential of deep generative for data augmentation in marine mammal call detection including: Variational Autoencoders, Generative Adversarial Networks, and Denoising Diffusion Probabilistic Models. Using Southern Resident Killer Whale (Orcinus orca) vocalizations from two long-term hydrophone deployments in the Salish Sea, we compare these approaches against traditional augmentation methods such as time-shifting and vocalization masking. While all generative approaches improved classification performance relative to the baseline, diffusion-based augmentation yielded the highest recall (0.87) and overall F1-score (0.75). A hybrid strategy combining generative-based synthesis with traditional methods achieved the best overall performance with an F1-score of 0.81. We hope this study encourages further exploration of deep generative models as complementary augmentation strategies to advance acoustic monitoring of threatened marine mammal populations.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2511.21872

Country:

North America > United States (0.67)
Europe (0.67)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.80)

Add feedback

Towards Leveraging Sequential Structure in Animal Vocalizations

Sarkar, Eklavya, -Doss, Mathew Magimai.

arXiv.org Artificial IntelligenceNov-14-2025

Animal vocalizations contain sequential structures that carry important communicative information, yet most computational bioacoustics studies average the extracted frame-level features across the temporal axis, discarding the order of the sub-units within a vocalization. This paper investigates whether discrete acoustic token sequences, derived through vector quantization and gumbel-softmax vector quantization of extracted self-supervised speech model representations can effectively capture and leverage temporal information. To that end, pairwise distance analysis of token sequences generated from HuBERT embeddings shows that they can discriminate call-types and callers across four bioacoustics datasets. Sequence classification experiments using $k$-Nearest Neighbour with Levenshtein distance show that the vector-quantized token sequences yield reasonable call-type and caller classification performances, and hold promise as alternative feature representations towards leveraging sequential information in animal vocalizations.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.1019

Country: Europe > Switzerland (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Sperm whales use vowels like humans, new study finds

Scientists decoding whale clicks found patterns that echo the building blocks of human speech. The marine mammals have a complex communication system that scientists are working to decode. Breakthroughs, discoveries, and DIY tips sent every weekday. A new study discovered a fresh component of their various vocalizations and could hint at potential language structures. Sperm whales exhibit patterns similar to human vowels and diphthongs-a connected pair of vowels in a word, such as the "oi" in .

artificial intelligence, sperm whale, vowel, (12 more...)

Popular Science

Country: North America > United States > California (0.16)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence (0.50)

Add feedback