Goto

Collaborating Authors

 ambisonic


Gen-A: Generalizing Ambisonics Neural Encoding to Unseen Microphone Arrays

Heikkinen, Mikko, Politis, Archontis, Drossos, Konstantinos, Virtanen, Tuomas

arXiv.org Artificial Intelligence

Using deep neural networks (DNNs) for encoding of microphone array (MA) signals to the Ambisonics spatial audio format can surpass certain limitations of established conventional methods, but existing DNN-based methods need to be trained separately for each MA. This paper proposes a DNN-based method for Ambisonics encoding that can generalize to arbitrary MA geometries unseen during training. The method takes as inputs the MA geometry and MA signals and uses a multi-level encoder consisting of separate paths for geometry and signal data, where geometry features inform the signal encoder at each level. The method is validated in simulated anechoic and reverberant conditions with one and two sources. The results indicate improvement over conventional encoding across the whole frequency range for dry scenes, while for reverberant scenes the improvement is frequency-dependent.


Compression of Higher Order Ambisonics with Multichannel RVQGAN

Hirvonen, Toni, Namazi, Mahmoud

arXiv.org Artificial Intelligence

A multichannel extension to the RVQGAN neural coding method is proposed, and realized for data-driven compression of third-order Ambisonics audio. The input- and output layers of the generator and discriminator models are modified to accept multiple (16) channels without increasing the model bitrate. We also propose a loss function for accounting for spatial perception in immersive reproduction, and transfer learning from single-channel models. Listening test results with 7.1.4 immersive playback show that the proposed extension is suitable for coding scene-based, 16-channel Ambisonics content with good quality at 16 kbps when trained and tested on the EigenScape database. The model has potential applications for learning other types of content and multichannel formats.


HARP: A Large-Scale Higher-Order Ambisonic Room Impulse Response Dataset

Saini, Shivam, Peissig, Jürgen

arXiv.org Artificial Intelligence

This contribution introduces a dataset of 7th-order Ambisonic Room Impulse Responses (HOA-RIRs), created using the Image Source Method. By employing higher-order Ambisonics, our dataset enables precise spatial audio reproduction, a critical requirement for realistic immersive audio applications. Leveraging the virtual simulation, we present a unique microphone configuration, based on the superposition principle, designed to optimize sound field coverage while addressing the limitations of traditional microphone arrays. The presented 64-microphone configuration allows us to capture RIRs directly in the Spherical Harmonics domain. The dataset features a wide range of room configurations, encompassing variations in room geometry, acoustic absorption materials, and source-receiver distances. A detailed description of the simulation setup is provided alongside for an accurate reproduction. The dataset serves as a vital resource for researchers working on spatial audio, particularly in applications involving machine learning to improve room acoustics modeling and sound field synthesis. It further provides a very high level of spatial resolution and realism crucial for tasks such as source localization, reverberation prediction, and immersive sound reproduction.


Sound Event Detection and Localization with Distance Estimation

Krause, Daniel Aleksander, Politis, Archontis, Mesaros, Annamaria

arXiv.org Artificial Intelligence

Sound Event Detection and Localization (SELD) is a combined task of identifying sound events and their corresponding direction-of-arrival (DOA). While this task has numerous applications and has been extensively researched in recent years, it fails to provide full information about the sound source position. In this paper, we overcome this problem by extending the task to Sound Event Detection, Localization with Distance Estimation (3D SELD). We study two ways of integrating distance estimation within the SELD core - a multi-task approach, in which the problem is tackled by a separate model output, and a single-task approach obtained by extending the multi-ACCDOA method to include distance information. We investigate both methods for the Ambisonic and binaural versions of STARSS23: Sony-TAU Realistic Spatial Soundscapes 2023. Moreover, our study involves experiments on the loss function related to the distance estimation part. Our results show that it is possible to perform 3D SELD without any degradation of performance in sound event detection and DOA estimation.


Quaternion Convolutional Neural Networks for Detection and Localization of 3D Sound Events

Comminiello, Danilo, Lella, Marco, Scardapane, Simone, Uncini, Aurelio

arXiv.org Artificial Intelligence

Learning from data in the quaternion domain enables us to exploit internal dependencies of 4D signals and treating them as a single entity. One of the models that perfectly suits with quaternion-valued data processing is represented by 3D acoustic signals in their spherical harmonics decomposition. In this paper, we address the problem of localizing and detecting sound events in the spatial sound field by using quaternion-valued data processing. In particular, we consider the spherical harmonic components of the signals captured by a first-order ambisonic microphone and process them by using a quaternion convolutional neural network. Experimental results show that the proposed approach exploits the correlated nature of the ambisonic signals, thus improving accuracy results in 3D sound event detection and localization.