Mixture of Dynamical Variational Autoencoders for Multi-Source Trajectory Modeling and Separation

Lin, Xiaoyu, Girin, Laurent, Alameda-Pineda, Xavier

arXiv.org Artificial Intelligence 

In this paper, we propose a latent-variable generative model called mixture of dynamical variational autoencoders (MixDV AE) to model the dynamics of a system composed of multiple moving sources. A DV AE model is pre-trained on a single-source dataset to capture the source dynamics. Then, multiple instances of the pre-trained DV AE model are integrated into a multi-source mixture model with a discrete observation-to-source assignment latent variable. The posterior distributions of both the discrete observation-to-source assignment variable and the continuous DV AE variables representing the sources content/position are estimated using a variational expectation-maximization algorithm, leading to multi-source trajectories estimation. We illustrate the versatility of the proposed MixDV AE model on two tasks: a computer vision task, namely multi-object tracking, and an audio processing task, namely single-channel audio source separation. Experimental results show that the proposed method works well on these two tasks, and outperforms several baseline methods.