Multimodal Variational Autoencoders for Semi-Supervised Learning: In Defense of Product-of-Experts

Kutuzova, Svetlana, Krause, Oswin, McCloskey, Douglas, Nielsen, Mads, Igel, Christian

Jan-18-2021–arXiv.org Artificial Intelligence

Multimodal generative modelling is important because information about real-world objects typically comes in different representations, or modalities. The information provided by each modality may be erroneous and/or incomplete, and a complete reconstruction of the full information can often only be achieved by combining several modalities. For example, in image-and video-guided translation (Caglayan et al., 2019), additional visual context can potentially resolve ambiguities (e.g., noun genders) when translating written text. In many applications, modalities may be missing for a subset of the observed samples during training and deployment. Often the description of an object in one modality is easy to obtain, while annotating it with another modality is slow and expensive. Given two modalities, we call samples paired when both modalities are present, and unpaired if one is missing. The simplest way to deal with paired and unpaired training examples is to discard the unpaired observations for learning.

architecture, modality, reconstruction, (15 more...)

arXiv.org Artificial Intelligence

Jan-18-2021

arXiv.org PDF

Add feedback

Country:
- Europe > Denmark
  - Capital Region > Copenhagen (0.04)
- North America > United States
  - California (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Inductive Learning (1.00)
  - Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found