Goto

Collaborating Authors

 Roman, Adrian S.


Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes

arXiv.org Artificial Intelligence

For this reason, the sound localization This technical report details our work towards building an performance strongly depends on the video content enhanced audio-visual sound event localization and detection [10]. This makes models prone to erroneous SELD on frames (SELD) network. We build on top of the audio-only with no audio or uncorrelated audio activity. SELDnet23 model and adapt it to be audio-visual by merging We introduce a visual branch into the audio-only SELDnet23 both audio and video information prior to the gated recurrent baseline from the Classification of Acoustic Scenes and unit (GRU) of the audio-only network.


Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms

arXiv.org Artificial Intelligence

Sound event localization and detection (SELD) is an important task in machine listening. Major advancements rely on simulated data with sound events in specific rooms and strong spatio-temporal labels. SELD data is simulated by convolving spatialy-localized room impulse responses (RIRs) with sound waveforms to place sound events in a soundscape. However, RIRs require manual collection in specific rooms. We present SpatialScaper, a library for SELD data simulation and augmentation. Compared to existing tools, SpatialScaper emulates virtual rooms via parameters such as size and wall absorption. This allows for parameterized placement (including movement) of foreground and background sound sources. SpatialScaper also includes data augmentation pipelines that can be applied to existing SELD data. As a case study, we use SpatialScaper to add rooms to the DCASE SELD data. Training a model with our data led to progressive performance improves as a direct function of acoustic diversity. These results show that SpatialScaper is valuable to train robust SELD models.