MHVAE: a Human-Inspired Deep Hierarchical Generative Model for Multimodal Representation Learning
Vasco, Miguel, Melo, Francisco S., Paiva, Ana
Humans are able to create rich representations of their external reality. Their internal representations allow for cross-modality inference, where available perceptions can induce the perceptual experience of missing input modalities. In this paper, we contribute the Multimodal Hierarchical Variational Auto-encoder (MHVAE), a hierarchical multimodal generative model for representation learning. Inspired by human cognitive models, the MHVAE is able to learn modality-specific distributions, of an arbitrary number of modalities, and a joint-modality distribution, responsible for cross-modality inference. We formally derive the model's evidence lower bound and propose a novel methodology to approximate the joint-modality posterior based on modality-specific representation dropout. We evaluate the MHVAE on standard multimodal datasets. Our model performs on par with other state-of-the-art generative models regarding joint-modality reconstruction from arbitrary input modalities and cross-modality inference.
Jun-4-2020
- Genre:
- Research Report (0.64)
- Technology:
- Information Technology > Artificial Intelligence
- Representation & Reasoning (1.00)
- Machine Learning > Neural Networks (1.00)
- Robots (0.94)
- Natural Language > Generation (0.85)
- Information Technology > Artificial Intelligence