Goto

Collaborating Authors

 soundscape recording


Semi-supervised classification of bird vocalizations

arXiv.org Artificial Intelligence

Changes in bird populations can indicate broader changes in ecosystems, making birds one of the most important animal groups to monitor. Combining machine learning and passive acoustics enables continuous monitoring over extended periods without direct human involvement. However, most existing techniques require extensive expert-labeled datasets for training and cannot easily detect time-overlapping calls in busy soundscapes. We propose a semi-supervised acoustic bird detector designed to allow both the detection of time-overlapping calls (when separated in frequency) and the use of few labeled training samples. The classifier is trained and evaluated on a combination of community-recorded open-source data and long-duration soundscape recordings from Singapore. It outperforms the state-of-the-art BirdNET classifier on a test set of 103 bird species despite significantly fewer labeled training samples. The detector is further tested on 144 microphone-hours of continuous soundscape data. The rich soundscape in Singapore makes suppression of false positives a challenge on raw, continuous data streams. Nevertheless, we demonstrate that achieving high precision in such environments with minimal labeled training data is possible. Introduction Biodiversity monitoring is a critical aspect of biodiversity conservation, as it helps inform decision making, improves our knowledge and enhances public education and awareness. Birds are one of the most surveyed animal groups in biodiversity monitoring programmes, with point counts and transect surveys being well-established survey techniques for monitoring bird communities [1]. However, birds can be very difficult to detect and identify especially in tropical regions characterised by high avian diversity and numerous rare species [2], [3]. Additionally, such manned survey techniques are manpower-intensive, require highly specialized expertise, and tend to overlook rare species that are sensitive to human presence [4], [5], [6]. Passive monitoring of biodiversity using acoustics is thus an area of great potential, as various animal groups including birds make unique vocalizations, which can be used to validate their presence.


Towards Deep Active Learning in Avian Bioacoustics

arXiv.org Artificial Intelligence

Passive acoustic monitoring (PAM) in avian bioacoustics enables cost-effective and extensive data collection with minimal disruption to natural habitats. Despite advancements in computational avian bioacoustics, deep learning models continue to encounter challenges in adapting to diverse environments in practical PAM scenarios. This is primarily due to the scarcity of annotations, which requires labor-intensive efforts from human experts. Active learning (AL) reduces annotation cost and speed ups adaption to diverse scenarios by querying the most informative instances for labeling. This paper outlines a deep AL approach, introduces key challenges, and conducts a small-scale pilot study.


BirdSet: A Dataset and Benchmark for Classification in Avian Bioacoustics

arXiv.org Artificial Intelligence

Deep learning (DL) models have emerged as a powerful tool in avian bioacoustics to assess environmental health. To maximize the potential of cost-effective and minimal-invasive passive acoustic monitoring (PAM), DL models must analyze bird vocalizations across a wide range of species and environmental conditions. However, data fragmentation challenges a comprehensive evaluation of generalization performance. Therefore, we introduce the BirdSet dataset, comprising approximately 520,000 global bird recordings for training and over 400 hours of PAM recordings for testing. Our benchmark offers baselines for several DL models to enhance comparability and consolidate research across studies, along with code implementations that include comprehensive training and evaluation protocols.


AudioProtoPNet: An interpretable deep learning model for bird sound classification

arXiv.org Artificial Intelligence

Recently, scientists have proposed several deep learning models to monitor the diversity of bird species. These models can detect bird species with high accuracy by analyzing acoustic signals. However, traditional deep learning algorithms are black-box models that provide no insight into their decision-making process. For domain experts, such as ornithologists, it is crucial that these models are not only efficient, but also interpretable in order to be used as assistive tools. In this study, we present an adaption of the Prototypical Part Network (ProtoPNet) for audio classification that provides inherent interpretability through its model architecture. Our approach is based on a ConvNeXt backbone architecture for feature extraction and learns prototypical patterns for each bird species using spectrograms of the training data. Classification of new data is done by comparison with these prototypes in latent space, which simultaneously serve as easily understandable explanations for the model's decisions. We evaluated the performance of our model on seven different datasets representing bird species from different geographical regions. In our experiments, the model showed excellent results, achieving an average AUROC of 0.82 and an average cmAP of 0.37 across the seven datasets, making it comparable to state-of-the-art black-box models for bird sound classification. Thus, this work demonstrates that even for the challenging task of bioacoustic bird classification, powerful yet interpretable deep learning models can be developed to provide valuable insights to domain experts.


Active Bird2Vec: Towards End-to-End Bird Sound Monitoring with Transformers

arXiv.org Artificial Intelligence

We propose a shift towards end-to-end learning in bird sound monitoring by combining self-supervised (SSL) and deep active learning (DAL). Leveraging transformer models, we aim to bypass traditional spectrogram conversions, enabling direct raw audio processing. ActiveBird2Vec is set to generate high-quality bird sound representations through SSL, potentially accelerating the assessment of environmental changes and decision-making processes for wind farms. Additionally, we seek to utilize the wide variety of bird vocalizations through DAL, reducing the reliance on extensively labeled datasets by human experts. We plan to curate a comprehensive set of tasks through Huggingface Datasets, enhancing future comparability and reproducibility of bioacoustic research. A comparative analysis between various transformer models will be conducted to evaluate their proficiency in bird sound recognition tasks. We aim to accelerate the progression of avian bioacoustic research and contribute to more effective conservation strategies.


In Search for a Generalizable Method for Source Free Domain Adaptation

arXiv.org Artificial Intelligence

Source-free domain adaptation (SFDA) is compelling because it allows adapting an off-the-shelf model to a new domain using only unlabelled data. In this work, we apply existing SFDA techniques to a challenging set of naturally-occurring distribution shifts in bioacoustics, which are very different from the ones commonly studied in computer vision. We find existing methods perform differently relative to each other than observed in vision benchmarks, and sometimes perform worse than no adaptation at all. We propose a new simple method which outperforms the existing methods on our new shifts while exhibiting strong performance on a range of vision datasets. Our findings suggest that existing SFDA methods are not as generalizable as previously thought and that considering diverse modalities can be a useful avenue for designing more robust models.


Comparing Western and Chinese classical music using deep learning algorithms

#artificialintelligence

Deep learning techniques are proving to be extremely useful for analyzing all kinds of data, ranging from images to text, online posts and audio recordings. These techniques are designed to identify patterns in large datasets, separate items in different categories and make predictions far quicker than humans. In a recent study, researchers at Simon Fraser University, Academia Sinica and Dartmouth College have applied deep learning techniques to identify similarities and differences between Chinese and Western classical music. Their paper, pre-published on arXiv, presents a comparative analysis of music recordings using sound event detection (SED) and soundscape emotion recognition (SER) models. "We have listened to both Chinese and Western classical music," Jianyu Fan, one of the researchers who carried out the study, told TechXplore.


BirdCLEF 2018 ImageCLEF / LifeCLEF - Multimedia Retrieval in CLEF

#artificialintelligence

As in 2017, two scenarios will be evaluated, (i) the identification of a particular bird specimen in a recording of it, and (ii), the recognition of all specimens singing in a long sequence (up to one hour) of raw soundscapes that can contain tens of birds singing simultaneously. The first scenario is aimed at developing new interactive identification tools, to help user and expert who is today equipped with a directional microphone and spend too much time observing and listening the birds to assess their population on the field. The soundscapes, on the other side, correspond to a passive monitoring scenario in which any multi-directional audio recording device could be used without or with very light user's involvement, and thus efficient biodiversity assessment. The goal of the task is to identify the species of the most audible bird (i.e. the one that was intended to be recorded) in each of the provided test recordings. Therefore, the evaluated systems have to return a ranked list of possible species for each of the 12,347 test recordings.