Goto

Collaborating Authors

 modality


Unsupervised Image-to-Image Translation Using Domain-Specific Variational Information Bound

Neural Information Processing Systems

Unsupervised image-to-image translation is a class of computer vision problems which aims at modeling conditional distribution of images in the target domain, given a set of unpaired images in the source and target domains. An image in the source domain might have multiple representations in the target domain. Therefore, ambiguity in modeling of the conditional distribution arises, specially when the images in the source and target domains come from different modalities. Current approaches mostly rely on simplifying assumptions to map both domains into a shared-latent space. Consequently, they are only able to model the domain-invariant information between the two modalities. These approaches cannot model domain-specific information which has no representation in the target domain. In this work, we propose an unsupervised image-to-image translation framework which maximizes a domain-specific variational information bound and learns the target domain-invariant representation of the two domain. The proposed framework makes it possible to map a single source image into multiple images in the target domain, utilizing several target domain-specific codes sampled randomly from the prior distribution, or extracted from reference images.


Multimodal Generative Models for Scalable Weakly-Supervised Learning

Neural Information Processing Systems

Learning a joint representation of these modalities should yield deeper and more useful representations.Previous generative approaches to multi-modal input either do not learn a joint distribution or require additional computation to handle missing data. Here, we introduce a multimodal variational autoencoder (MVAE) that uses a product-of-experts inference network and a sub-sampled training paradigm to solve the multi-modal inference problem. Notably, our model shares parameters to efficiently learn under any combination of missing modalities. We apply the MVAE on four datasets and match state-of-the-art performance using many fewer parameters. In addition, we show that the MVAE is directly applicable to weakly-supervised learning, and is robust to incomplete supervision. We then consider two case studies, one of learning image transformations---edge detection, colorization, segmentation---as a set of modalities, followed by one of machine translation between two languages. We find appealing results across this range of tasks.




MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset for Versatile Wireless Sensing Jianfei Y ang 1, He Huang 1, Y unjiao Zhou

Neural Information Processing Systems

MA TLAB, as shown in Table 2. To enhance the sensing quality, we have aggregated five adjacent frames into a new frame for use. WiFi CSI data, there are some "-inf" values in some sequences. The "-inf" number comes from the To facilitate the users, we have embedded these processing codes into our dataset tool. When the user loads our WiFi CSI data, these numbers will be handled by linear interpolation. As presented in Section 4.3, we provide the temporal Each sequence is annotated by at least 5 human annotators.