Review for NeurIPS paper: VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data

Neural Information Processing Systems 

This naturally brings up the question of whether careful tuning of the scaling coefficient for the likelihood function of each dimension could ease the aforementioned optimization difficulties. The "VAE-adaptive" baseline seems to be a data-dependent attempt at this, but I'm not convinced that a single minibatch is sufficient for computing the coefficients for each data type (as described in Appendix C.1.2). In particular, it'd be interesting to see if VAEM would outperform a (possibly hierarchical) VAE with more carefully tuned scaling factors for each dimension to rule out the possibility that the poor performance of vanilla VAE baselines is simply due to hyperparameter tuning.