vocalization
Is the Rat War Over?
Is the Rat War Over? In New York, a rat czar and new methods have brought down complaints. We may even be ready to appreciate the creatures. Rats were leaving Manhattan, hurrying across the bridges in single-file lines. Some went to Westchester, some to Brooklyn. It was the pandemic, and the rats, which had been living off the nourishing trash of New York's densest borough for generations, were as panicked about the closure of restaurants as we were. People were eating three meals a day at home, and the rats were hungry. At least that was the story going around.
- North America > United States > New York (0.47)
- Asia > Russia (0.14)
- Europe > Norway (0.05)
- (12 more...)
Vocal Call Locator Benchmark (VCL) for localizing rodent vocalizations from multi-channel audio
Understanding the behavioral and neural dynamics of social interactions is a goalof contemporary neuroscience. Many machine learning methods have emergedin recent years to make sense of complex video and neurophysiological data thatresult from these experiments. Less focus has been placed on understanding howanimals process acoustic information, including social vocalizations. A criticalstep to bridge this gap is determining the senders and receivers of acoustic infor-mation in social interactions. While sound source localization (SSL) is a classicproblem in signal processing, existing approaches are limited in their ability tolocalize animal-generated sounds in standard laboratory environments.
WhAM: Towards A Translative Model of Sperm Whale Vocalization
Paradise, Orr, Muralikrishnan, Pranav, Chen, Liangyuan, García, Hugo Flores, Pardo, Bryan, Diamant, Roee, Gruber, David F., Gero, Shane, Goldwasser, Shafi
Sperm whales communicate in short sequences of clicks known as codas. We present WhAM (Whale Acoustics Model), the first transformer-based model capable of generating synthetic sperm whale codas from any audio prompt. WhAM is built by finetuning VampNet, a masked acoustic token model pretrained on musical audio, using 10k coda recordings collected over the past two decades. Through iterative masked token prediction, WhAM generates high-fidelity synthetic codas that preserve key acoustic features of the source recordings. We evaluate WhAM's synthetic codas using Fréchet Audio Distance and through perceptual studies with expert marine biologists. On downstream classification tasks including rhythm, social unit, and vowel classification, WhAM's learned representations achieve strong performance, despite being trained for generation rather than classification. Our code is available at https://github.com/Project-CETI/wham
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > Dominica (0.04)
- (22 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Leisure & Entertainment (1.00)
- Media > Music (0.93)
- Health & Medicine (0.67)
- Education (0.67)
Advancing Marine Bioacoustics with Deep Generative Models: A Hybrid Augmentation Strategy for Southern Resident Killer Whale Detection
Padovese, Bruno, Frazao, Fabio, Dowd, Michael, Joy, Ruth
Automated detection and classification of marine mammals vocalizations is critical for conservation and management efforts but is hindered by limited annotated datasets and the acoustic complexity of real-world marine environments. Data augmentation has proven to be an effective strategy to address this limitation by increasing dataset diversity and improving model generalization without requiring additional field data. However, most augmentation techniques used to date rely on effective but relatively simple transformations, leaving open the question of whether deep generative models can provide additional benefits. In this study, we evaluate the potential of deep generative for data augmentation in marine mammal call detection including: Variational Autoencoders, Generative Adversarial Networks, and Denoising Diffusion Probabilistic Models. Using Southern Resident Killer Whale (Orcinus orca) vocalizations from two long-term hydrophone deployments in the Salish Sea, we compare these approaches against traditional augmentation methods such as time-shifting and vocalization masking. While all generative approaches improved classification performance relative to the baseline, diffusion-based augmentation yielded the highest recall (0.87) and overall F1-score (0.75). A hybrid strategy combining generative-based synthesis with traditional methods achieved the best overall performance with an F1-score of 0.81. We hope this study encourages further exploration of deep generative models as complementary augmentation strategies to advance acoustic monitoring of threatened marine mammal populations.
- North America > Canada > British Columbia (0.05)
- North America > United States > Alaska (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- (7 more...)
- North America > United States (0.46)
- Asia > Middle East > Iran (0.04)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
Towards Leveraging Sequential Structure in Animal Vocalizations
Sarkar, Eklavya, -Doss, Mathew Magimai.
Animal vocalizations contain sequential structures that carry important communicative information, yet most computational bioacoustics studies average the extracted frame-level features across the temporal axis, discarding the order of the sub-units within a vocalization. This paper investigates whether discrete acoustic token sequences, derived through vector quantization and gumbel-softmax vector quantization of extracted self-supervised speech model representations can effectively capture and leverage temporal information. To that end, pairwise distance analysis of token sequences generated from HuBERT embeddings shows that they can discriminate call-types and callers across four bioacoustics datasets. Sequence classification experiments using $k$-Nearest Neighbour with Levenshtein distance show that the vector-quantized token sequences yield reasonable call-type and caller classification performances, and hold promise as alternative feature representations towards leveraging sequential information in animal vocalizations.
- North America > United States (0.14)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Sperm whales use vowels like humans, new study finds
Scientists decoding whale clicks found patterns that echo the building blocks of human speech. The marine mammals have a complex communication system that scientists are working to decode. Breakthroughs, discoveries, and DIY tips sent every weekday. A new study discovered a fresh component of their various vocalizations and could hint at potential language structures. Sperm whales exhibit patterns similar to human vowels and diphthongs-a connected pair of vowels in a word, such as the "oi" in .
- South America > Brazil (0.05)
- North America > United States > California > Alameda County > Berkeley (0.05)
- North America > Dominica (0.05)
- (2 more...)
Time-series Random Process Complexity Ranking Using a Bound on Conditional Differential Entropy
Ayers, Jacob, Hahnloser, Richard, Ulrich, Julia, Krapp, Lothar Sebastian, Nitschke, Remo, Stoll, Sabine, Bickel, Balthasar, Furrer, Reinhard
Conditional differential entropy provides an intuitive measure for relatively ranking time-series complexity by quantifying uncertainty in future observations given past context. However, its direct computation for high-dimensional processes from unknown distributions is often intractable. This paper builds on the information theoretic prediction error bounds established by Fang et al. \cite{fang2019generic}, which demonstrate that the conditional differential entropy \textbf{$h(X_k \mid X_{k-1},...,X_{k-m})$} is upper bounded by a function of the determinant of the covariance matrix of next-step prediction errors for any next step prediction model. We add to this theoretical framework by further increasing this bound by leveraging Hadamard's inequality and the positive semi-definite property of covariance matrices. To see if these bounds can be used to rank the complexity of time series, we conducted two synthetic experiments: (1) controlled linear autoregressive processes with additive Gaussian noise, where we compare ordinary least squares prediction error entropy proxies to the true entropies of various additive noises, and (2) a complexity ranking task of bio-inspired synthetic audio data with unknown entropy, where neural network prediction errors are used to recover the known complexity ordering. This framework provides a computationally tractable method for time-series complexity ranking using prediction errors from next-step prediction models, that maintains a theoretical foundation in information theory.
- Europe > Switzerland > Zürich > Zürich (0.16)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Beyond Discrete Categories: Multi-Task Valence-Arousal Modeling for Pet Vocalization Analysis
Traditional pet emotion recognition from vocalizations, based on discrete classification, struggles with ambiguity and capturing intensity variations. We propose a continuous Valence-Arousal (VA) model that represents emotions in a two-dimensional space. Our method uses an automatic VA label generation algorithm, enabling large-scale annotation of 42,553 pet vocalization samples. A multi-task learning framework jointly trains VA regression with auxiliary tasks (emotion, body size, gender) to enhance prediction by improving feature learning. Our Audio Transformer model achieves a validation Valence Pearson correlation of r = 0.9024 and an Arousal r = 0.7155, effectively resolving confusion between discrete categories like "territorial" and "happy." This work introduces the first continuous VA framework for pet vocalization analysis, offering a more expressive representation for human-pet interaction, veterinary diagnostics, and behavioral training. The approach shows strong potential for deployment in consumer products like AI pet emotion translators.
- Health & Medicine > Therapeutic Area (0.68)
- Health & Medicine > Consumer Health (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
- Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.68)
- (2 more...)