wav2pos: Sound Source Localization using Masked Autoencoders

Berg, Axel, Gulin, Jens, O'Connor, Mark, Zhou, Chuteng, Åström, Karl, Oskarsson, Magnus

Aug-28-2024–arXiv.org Artificial Intelligence

Abstract--We present a novel approach to the 3D sound source localization task for distributed ad-hoc microphone arrays by formulating it as a set-to-set regression problem. By training a multi-modal masked autoencoder model that operates on audio recordings and microphone coordinates, we show that such a formulation allows for accurate localization of the sound source, by reconstructing coordinates masked in the input. Our approach is flexible in the sense that a single model can be used with an arbitrary number of microphones, even when a subset of audio recordings and microphone coordinates are missing. We test our method on simulated and real-world recordings of music and speech in indoor environments, and demonstrate competitive performance compared to both classical and other learning based localization methods. Here, predictions on the music3 recording from the Mapping, positioning and localization are key enabling LuViRa dataset [6] are shown (viewed from above), where a technologies for a wide range of applications.

localization, microphone, source localization, (16 more...)

arXiv.org Artificial Intelligence

Aug-28-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Sweden (0.04)
- Asia (0.04)

Genre:
- Research Report (0.70)
- Overview (0.48)

Industry:
- Media (0.54)
- Leisure & Entertainment (0.54)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found