Multimodal Variational Autoencoders for Semi-Supervised Learning: In Defense of Product-of-Experts
Kutuzova, Svetlana, Krause, Oswin, McCloskey, Douglas, Nielsen, Mads, Igel, Christian
–arXiv.org Artificial Intelligence
Multimodal generative modelling is important because information about real-world objects typically comes in different representations, or modalities. The information provided by each modality may be erroneous and/or incomplete, and a complete reconstruction of the full information can often only be achieved by combining several modalities. For example, in image-and video-guided translation (Caglayan et al., 2019), additional visual context can potentially resolve ambiguities (e.g., noun genders) when translating written text. In many applications, modalities may be missing for a subset of the observed samples during training and deployment. Often the description of an object in one modality is easy to obtain, while annotating it with another modality is slow and expensive. Given two modalities, we call samples paired when both modalities are present, and unpaired if one is missing. The simplest way to deal with paired and unpaired training examples is to discard the unpaired observations for learning.
arXiv.org Artificial Intelligence
Jan-18-2021
- Country:
- Europe > Denmark
- Capital Region > Copenhagen (0.04)
- North America > United States
- California (0.04)
- Europe > Denmark
- Genre:
- Research Report (1.00)
- Technology: