Goto

Collaborating Authors

 bioacoustic


No Free Lunch from Audio Pretraining in Bioacoustics: A Benchmark Study of Embeddings

Chen, Chenggang, Yang, Zhiyu

arXiv.org Artificial Intelligence

Bioacoustics, the study of animal sounds, offers a non-invasive method to monitor ecosystems. Extracting embeddings from audio-pretrained deep learning (DL) models without fine-tuning has become popular for obtaining bioacoustic features for tasks. However, a recent benchmark study reveals that while fine-tuned audio-pretrained VGG and transformer models achieve state-of-the-art performance in some tasks, they fail in others. This study benchmarks 11 DL models on the same tasks by reducing their learned embeddings' dimensionality and evaluating them through clustering. We found that audio-pretrained DL models 1) without fine-tuning even underperform fine-tuned AlexNet, 2) both with and without fine-tuning fail to separate the background from labeled sounds, but ResNet does, and 3) outperform other models when fewer background sounds are included during fine-tuning. This study underscores the necessity of fine-tuning audio-pretrained models and checking the embeddings after fine-tuning. Our codes are available: https://github.com/NeuroscienceAI/Audio\_Embeddings


Transferable Models for Bioacoustics with Human Language Supervision

Robinson, David, Robinson, Adelaide, Akrapongpisak, Lily

arXiv.org Artificial Intelligence

Passive acoustic monitoring offers a scalable, non-invasive method for tracking global biodiversity and anthropogenic impacts on species. Although deep learning has become a vital tool for processing this data, current models are inflexible, typically cover only a handful of species, and are limited by data scarcity. In this work, we propose BioLingual, a new model for bioacoustics based on contrastive language-audio pretraining. We first aggregate bioacoustic archives into a language-audio dataset, called AnimalSpeak, with over a million audio-caption pairs holding information on species, vocalization context, and animal behavior. After training on this dataset to connect language and audio representations, our model can identify over a thousand species' calls across taxa, complete bioacoustic tasks zero-shot, and retrieve animal vocalization recordings from natural text queries. When fine-tuned, BioLingual sets a new state-of-the-art on nine tasks in the Benchmark of Animal Sounds. Given its broad taxa coverage and ability to be flexibly queried in human language, we believe this model opens new paradigms in ecological monitoring and research, including free-text search on the world's acoustic monitoring archives. We open-source our models, dataset, and code.


Listening to Nature: The Emerging Field of Bioacoustics

#artificialintelligence

Mitch Aide, a tropical ecologist based in Puerto Rico, thinks we should listen to the earth a lot more than we do now -- and not just listen to it, but record and store its sounds on a massive scale. His aims are not spiritual, but scientific: He, his colleagues, and other experts are developing and deploying audio recorders, data transmission systems, and new artificial intelligence software that together are rapidly expanding scientists' ability to understand ecosystems by listening to them. Today, Aide can nail a cheap digital audio recorder to a tree in Puerto Rico's Luquillo Forest and transmit its recordings to a computer running prototype software, which indicates almost in real time whether any of 25 species of frogs and birds are vocalizing in the forest. The system's apparent simplicity belies its power – Aide thinks that it and similar systems will allow scientists to monitor ecosystems in ways we can't yet imagine. He dreams that one day soon, audio recordings of natural soundscapes will be like rainfall and temperature data, collected from a worldwide network of permanent stations, widely available for analysis, and permanently archived.