Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes
Roman, Adrian S., Balamurugan, Baladithya, Pothuganti, Rithik
–arXiv.org Artificial Intelligence
For this reason, the sound localization This technical report details our work towards building an performance strongly depends on the video content enhanced audio-visual sound event localization and detection [10]. This makes models prone to erroneous SELD on frames (SELD) network. We build on top of the audio-only with no audio or uncorrelated audio activity. SELDnet23 model and adapt it to be audio-visual by merging We introduce a visual branch into the audio-only SELDnet23 both audio and video information prior to the gated recurrent baseline from the Classification of Acoustic Scenes and unit (GRU) of the audio-only network.
arXiv.org Artificial Intelligence
Jan-29-2024
- Country:
- Asia > Middle East
- Iran (0.05)
- Europe
- North America > United States
- California (0.14)
- Asia > Middle East
- Genre:
- Research Report (0.50)
- Technology: