Sound source detection, localization and classification using consecutive ensemble of CRNN models
Kapka, Sławomir, Lewandowski, Mateusz
Each of these models is a copy of a single SELDnet node with just minor adjustments so that it fits to the specific subtask and for the regularization purpose. Each of these models takes as an input a fixed length subsequence of decibel scale amplitude spectrograms (in case of noas and class subtasks) or both decibel scale amplitude and phase spectrograms (in case of doa1 and doa2 subtasks) from all 4 channels. In each case, input layers are followed by 3 convolutional layer blocks made of convolutional layer, batch norm, relu activation, maxpool and dropout. The output from the last convolutional block is reshaped so that it forms a multivariate sequence of a fixed length. In the case of doa2, we additionaly concatenate directions of arrivals of associated events with this multivariate sequence.
Aug-2-2019