Goto

Collaborating Authors

 vaem


VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data

Neural Information Processing Systems

Deep generative models often perform poorly in real-world applications due to the heterogeneity of natural data sets. Heterogeneity arises from data containing different types of features (categorical, ordinal, continuous, etc.) and features of the same type having different marginal distributions. We propose an extension of variational autoencoders (VAEs) called VAEM to handle such heterogeneous data. VAEM is a deep generative model that is trained in a two stage manner, such that the first stage provides a more uniform representation of the data to the second stage, thereby sidestepping the problems caused by heterogeneous data. We provide extensions of VAEM to handle partially observed data, and demonstrate its performance in data generation, missing data prediction and sequential feature selection tasks. Our results show that VAEM broadens the range of real-world applications where deep generative models can be successfully deployed.




Review for NeurIPS paper: VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data

Neural Information Processing Systems

This naturally brings up the question of whether careful tuning of the scaling coefficient for the likelihood function of each dimension could ease the aforementioned optimization difficulties. The "VAE-adaptive" baseline seems to be a data-dependent attempt at this, but I'm not convinced that a single minibatch is sufficient for computing the coefficients for each data type (as described in Appendix C.1.2). In particular, it'd be interesting to see if VAEM would outperform a (possibly hierarchical) VAE with more carefully tuned scaling factors for each dimension to rule out the possibility that the poor performance of vanilla VAE baselines is simply due to hyperparameter tuning.


Review for NeurIPS paper: VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data

Neural Information Processing Systems

The paper proposes modelling vectors with dimensions having different types (real-valued and categorical) using a two-stage VAE approach. First, a VAE with a 1D latent is trained once for each input dimension to standardize the data. Then a "dependency" VAE is trained on top of the resulting latents to capture the dependence between them. Pros: -The approach is interesting and novel -The idea is simple and seems effective, so might be widely adopted -The paper is well written -VAEM outperforms sensible baselines at generative modelling and a sequential information acquisition task Cons: -It is not explained why the two-stage training approach is a good idea. The fact that joint training tends to perform less well than two-stage training, as reported in the rebuttal, is an important observation that should be discussed and, ideally, explained in the paper.


VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data

Neural Information Processing Systems

Deep generative models often perform poorly in real-world applications due to the heterogeneity of natural data sets. Heterogeneity arises from data containing different types of features (categorical, ordinal, continuous, etc.) and features of the same type having different marginal distributions. We propose an extension of variational autoencoders (VAEs) called VAEM to handle such heterogeneous data. VAEM is a deep generative model that is trained in a two stage manner, such that the first stage provides a more uniform representation of the data to the second stage, thereby sidestepping the problems caused by heterogeneous data. We provide extensions of VAEM to handle partially observed data, and demonstrate its performance in data generation, missing data prediction and sequential feature selection tasks.


VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data

Ma, Chao, Tschiatschek, Sebastian, Hernández-Lobato, José Miguel, Turner, Richard, Zhang, Cheng

arXiv.org Machine Learning

Deep generative models often perform poorly in real-world applications due to the heterogeneity of natural data sets. Heterogeneity arises from data containing different types of features (categorical, ordinal, continuous, etc.) and features of the same type having different marginal distributions. We propose an extension of variational autoencoders (VAEs) called VAEM to handle such heterogeneous data. VAEM is a deep generative model that is trained in a two stage manner such that the first stage provides a more uniform representation of the data to the second stage, thereby sidestepping the problems caused by heterogeneous data. We provide extensions of VAEM to handle partially observed data, and demonstrate its performance in data generation, missing data prediction and sequential feature selection tasks. Our results show that VAEM broadens the range of real-world applications where deep generative models can be successfully deployed.