Reviews: Multimodal Generative Models for Scalable Weakly-Supervised Learning

Neural Information Processing Systems 

This paper presents a generative approach to multimodal deep learning based on a product-of-experts (PoE) inference network. The main idea is to assume the joint distribution over all modalities factorises into a product of single-modality data-generating distributions when conditioned on the latent space, and use this to derive the structure and factorisation of the variational posterior. The proposed model shares parameters to efficiently handle any combination of missing modalities, and experiments indicate the model's efficacy on various benchmark datasets. The idea is intuitive, the exposition is well-written and easy to follow, and the results are thorough and compelling. I have a few questions / comments, mainly about the relationship of this work with respect to previous approaches ([15] and [21] in the text).