Goto

Collaborating Authors

 Yang, Xuewen


Hierarchical Mutual Information Analysis: Towards Multi-view Clustering in The Wild

arXiv.org Artificial Intelligence

Multi-view clustering (MVC) can explore common semantics from unsupervised views generated by different sources, and thus has been extensively used in applications of practical computer vision. Due to the spatio-temporal asynchronism, multi-view data often suffer from view missing and are unaligned in real-world applications, which makes it difficult to learn consistent representations. To address the above issues, this work proposes a deep MVC framework where data recovery and alignment are fused in a hierarchically consistent way to maximize the mutual information among different views and ensure the consistency of their latent spaces. More specifically, we first leverage dual prediction to fill in missing views while achieving the instance-level alignment, and then take the contrastive reconstruction to achieve the class-level alignment. To the best of our knowledge, this could be the first successful attempt to handle the missing and unaligned data problem separately with different learning paradigms. Extensive experiments on public datasets demonstrate that our method significantly outperforms state-of-the-art methods on multi-view clustering even in the cases of view missing and unalignment.


Exploring External Knowledge for Accurate modeling of Visual and Language Problems

arXiv.org Artificial Intelligence

The interest in Artificial Intelligence (AI) and its applications has seen unprecedented growth in the last few years. The success can be partly attributed to the advancements of deep neural networks made in the sub-fields of AI such as Computer Vision (CV) and Natural Language Processing (NLP). The promising research area that this dissertation focuses on is visual and language understanding which involves many challenging tasks, i.e., classification, detection, segmentation, machine translation and captioning, etc. The state-of-the-art methods for solving these problems usually involves only two parts: source data and target labels, which is rather insufficient especially when the dataset is small. Meanwhile, many external tools or sources can provide extra useful information (external knowledge) that can help improve the performance of these methods. For example, a detection model has been applied to provide better object features than state-of-the-art ResNet for image captioning models. Inspired by this observation, we developed a methodology that we can first extract external knowledge and then integrate it with the original models. The external knowledge has to be extracted from the dataset, or can directly come from external, e.g., grammar rules or scene graphs. We apply this methodology to different AI tasks, including machine translation and image captioning and improve the original state-of-the-art models by a large margin.


Learning Continuous-Time Dynamics by Stochastic Differential Networks

arXiv.org Machine Learning

Learning continuous-time stochastic dynamics is a fundamental and essential problem in modeling sporadic time series, whose observations are irregular and sparse in both time and dimension. For a given system whose latent states and observed data are high-dimensional, it is generally impossible to derive a precise continuous-time stochastic process to describe the system behaviors. To solve the above problem, we apply Variational Bayesian method and propose a flexible continuous-time stochastic recurrent neural network named Variational Stochastic Differential Networks (VSDN), which embeds the complicated dynamics of the sporadic time series by neural Stochastic Differential Equations (SDE). VSDNs capture the stochastic dependency among latent states and observations by deep neural networks. We also incorporate two differential Evidence Lower Bounds to efficiently train the models. Through comprehensive experiments, we show that VSDNs outperform state-of-the-art continuous-time deep learning models and achieve remarkable performance on prediction and interpolation tasks for sporadic time series.


Latent Part-of-Speech Sequences for Neural Machine Translation

arXiv.org Artificial Intelligence

Learning target side syntactic structure has been shown to improve Neural Machine Translation (NMT). However, incorporating syntax through latent variables introduces additional complexity in inference, as the models need to marginalize over the latent syntactic structures. To avoid this, models often resort to greedy search which only allows them to explore a limited portion of the latent space. In this work, we introduce a new latent variable model, LaSyn, that captures the co-dependence between syntax and semantics, while allowing for effective and efficient inference over the latent space. LaSyn decouples direct dependence between successive latent variables, which allows its decoder to exhaustively search through the latent syntactic choices, while keeping decoding speed proportional to the size of the latent variable vocabulary. We implement LaSyn by modifying a transformer-based NMT system and design a neural expectation maximization algorithm that we regularize with part-of-speech information as the latent sequences. Evaluations on four different MT tasks show that incorporating target side syntax with LaSyn improves both translation quality, and also provides an opportunity to improve diversity.