Mondal, Arnab Kumar
To Regularize or Not To Regularize? The Bias Variance Trade-off in Regularized AEs
Mondal, Arnab Kumar, Asnani, Himanshu, Singla, Parag, AP, Prathosh
Regularized Auto-Encoders (RAEs) form a rich class of neural generative models. They effectively model the joint-distribution between the data and the latent space using an Encoder-Decoder combination, with regularization imposed in terms of a prior over the latent space. Despite their advantages, such as stability in training, the performance of AE based models has not reached the superior standards as that of the other generative models such as Generative Adversarial Networks (GANs). Motivated by this, we examine the effect of the latent prior on the generation quality of deterministic AE models in this paper. Specifically, we consider the class of RAEs with deterministic Encoder-Decoder pairs, Wasserstein Auto-Encoders (WAE), and show that having a fixed prior distribution, \textit{a priori}, oblivious to the dimensionality of the `true' latent space, will lead to the infeasibility of the optimization problem considered. Further, we show that, in the finite data regime, despite knowing the correct latent dimensionality, there exists a bias-variance trade-off with any arbitrary prior imposition. As a remedy to both the issues mentioned above, we introduce an additional state space in the form of flexibly learnable latent priors, in the optimization objective of the WAEs. We implicitly learn the distribution of the latent prior jointly with the AE training, which not only makes the learning objective feasible but also facilitates operation on different points of the bias-variance curve. We show the efficacy of our model, called FlexAE, through several experiments on multiple datasets, and demonstrate that it is the new state-of-the-art for the AE based generative models.
RespVAD: Voice Activity Detection via Video-Extracted Respiration Patterns
Mondal, Arnab Kumar, P, Prathosh A.
Voice Activity Detection (VAD) refers to the task of identification of regions of human speech in digital signals such as audio and video. While VAD is a necessary first step in many speech processing systems, it poses challenges when there are high levels of ambient noise during the audio recording. To improve the performance of VAD in such conditions, several methods utilizing the visual information extracted from the region surrounding the mouth/lip region of the speakers' video recording have been proposed. Even though these provide advantages over audio-only methods, they depend on faithful extraction of lip/mouth regions. Motivated by these, a new paradigm for VAD based on the fact that respiration forms the primary source of energy for speech production is proposed. Specifically, an audio-independent VAD technique using the respiration pattern extracted from the speakers' video is developed. The Respiration Pattern is first extracted from the video focusing on the abdominal-thoracic region of a speaker using an optical flow based method. Subsequently, voice activity is detected from the respiration pattern signal using neural sequence-to-sequence prediction models. The efficacy of the proposed method is demonstrated through experiments on a challenging dataset recorded in real acoustic environments and compared with four previous methods based on audio and visual cues.
C-MI-GAN : Estimation of Conditional Mutual Information using MinMax formulation
Mondal, Arnab Kumar, Bhattacharya, Arnab, Mukherjee, Sudipto, AP, Prathosh, Kannan, Sreeram, Asnani, Himanshu
Two noteworthy quantities of widespread interest are the mutual information (MI) and conditional mutual information (CMI). Estimation of information theoretic quantities such as mutual information and its conditional In this work, we focus on estimating CMI, a quantity variant has drawn interest in recent times owing which provides the degree of dependence between to their multifaceted applications. Newly two random variables X and Y given a third variable proposed neural estimators for these quantities Z. CMI provides a strong theoretical guarantee that have overcome severe drawbacks of classical I(X; Y Z) 0 X Y Z. So, one motivation kNN-based estimators in high dimensions. In for estimating CMI is its use in conditional independence this work, we focus on conditional mutual information (CI) testing and detecting causal associations. CI (CMI) estimation by utilizing its formulation tester built using kNN based CMI estimator coupled with as a minmax optimization problem.
Group Equivariant Deep Reinforcement Learning
Mondal, Arnab Kumar, Nair, Pratheeksha, Siddiqi, Kaleem
In Reinforcement Learning (RL), Convolutional Neural Networks(CNNs) have been successfully applied as function approximators in Deep Q-Learning algorithms, which seek to learn action-value functions and policies in various environments. However, to date, there has been little work on the learning of symmetry-transformation equivariant representations of the input environment state. In this paper, we propose the use of Equivariant CNNs to train RL agents and study their inductive bias for transformation equivariant Q-value approximation. We demonstrate that equivariant architectures can dramatically enhance the performance and sample efficiency of RL agents in a highly symmetric environment while requiring fewer parameters. Additionally, we show that they are robust to changes in the environment caused by affine transformations.