Synthetic data enables context-aware bioacoustic sound event detection
Hoffman, Benjamin, Robinson, David, Miron, Marius, Baglione, Vittorio, Canestrari, Daniela, Elias, Damian, Trapote, Eva, Pietquin, Olivier
–arXiv.org Artificial Intelligence
We propose a methodology for training foundation models that enhances their in-context learning capabilities within the domain of bioacoustic signal processing. We use synthetically generated training data, introducing a domain-randomization-based pipeline that constructs diverse acoustic scenes with temporally strong labels. We generate over 8.8 thousand hours of strongly-labeled audio and train a query-by-example, transformer-based model to perform few-shot bioacoustic sound event detection. Our second contribution is a public benchmark of 13 diverse few-shot bioacoustics tasks. Our model outperforms previously published methods by 49%, and we demonstrate that this is due to both model design and data scale. We make our trained model available via an API, to provide ecologists and ethologists with a training-free tool for bioacoustic sound event detection.
arXiv.org Artificial Intelligence
Feb-28-2025
- Country:
- Africa > Gabon (0.04)
- Asia
- China > Hainan Province (0.04)
- South Korea > Seoul
- Seoul (0.04)
- Atlantic Ocean > North Atlantic Ocean
- Gulf of St. Lawrence (0.04)
- Europe > Spain (0.04)
- North America
- Canada > Gulf of St. Lawrence (0.04)
- United States
- California > Alameda County
- Berkeley (0.04)
- Colorado (0.04)
- Hawaii (0.04)
- Indiana > Marion County
- Lawrence (0.04)
- Nevada (0.04)
- Pennsylvania (0.04)
- California > Alameda County
- Pacific Ocean > North Pacific Ocean (0.04)
- South America > Brazil (0.04)
- Genre:
- Research Report (0.83)
- Technology: