Plotting

 Nguyen, Tu Dinh


Quaternion Graph Neural Networks

arXiv.org Machine Learning

Recently, graph neural networks (GNNs) become a principal research direction to learn low-dimensional continuous embeddings of nodes and graphs to predict node and graph labels, respectively. However, Euclidean embeddings have high distortion when using GNNs to model complex graphs such as social networks. Furthermore, existing GNNs are not very efficient with the high number of model parameters when increasing the number of hidden layers. Therefore, we move beyond the Euclidean space to a hyper-complex vector space to improve graph representation quality and reduce the number of model parameters. To this end, we propose quaternion graph neural networks (QGNN) to generalize GCNs within the Quaternion space to learn quaternion embeddings for nodes and graphs. The Quaternion space, a hyper-complex vector space, provides highly meaningful computations through Hamilton product compared to the Euclidean and complex vector spaces. As a result, our QGNN can reduce the model size up to four times and enhance learning better graph representations. Experimental results show that the proposed QGNN produces state-of-the-art accuracies on a range of well-known benchmark datasets for three downstream tasks, including graph classification, semi-supervised node classification, and text (node) classification.


QuatRE: Relation-Aware Quaternions for Knowledge Graph Embeddings

arXiv.org Artificial Intelligence

We propose a simple and effective embedding model, named QuatRE, to learn quaternion embeddings for entities and relations in knowledge graphs. QuatRE aims to enhance correlations between head and tail entities given a relation within the Quaternion space with Hamilton product. QuatRE achieves this by associating each relation with two quaternion vectors which are used to rotate the quaternion embeddings of the head and tail entities, respectively. To obtain the triple score, QuatRE rotates the rotated embedding of the head entity using the normalized quaternion embedding of the relation, followed by a quaternion-inner product with the rotated embedding of the tail entity. Experimental results show that our QuatRE outperforms up-to-date embedding models on well-known benchmark datasets for knowledge graph completion.


A Self-Attention Network based Node Embedding Model

arXiv.org Machine Learning

Despite several signs of progress have been made recently, limited research has been conducted for an inductive setting where embeddings are required for newly unseen nodes -- a setting encountered commonly in practical applications of deep learning for graph networks. This significantly affects the performances of downstream tasks such as node classification, link prediction or community extraction. To this end, we propose SANNE -- a novel unsupervised embedding model -- whose central idea is to employ a transformer self-attention network to iteratively aggregate vector representations of nodes in random walks. Our SANNE aims to produce plausible embeddings not only for present nodes, but also for newly unseen nodes. Experimental results show that the proposed SANNE obtains state-of-the-art results for the node classification task on well-known benchmark datasets.


Unsupervised Universal Self-Attention Network for Graph Classification

arXiv.org Machine Learning

Existing graph embedding models often have weaknesses in exploiting graph structure similarities, potential dependencies among nodes and global network properties. To this end, we present U2GAN, a novel unsupervised model leveraging on the strength of the recently introduced universal self-attention network (Dehghani et al., 2019), to learn low-dimensional embeddings of graphs which can be used for graph classification. In particular, given an input graph, U2GAN first applies a self-attention computation, which is then followed by a recurrent transition to iteratively memorize its attention on vector representations of each node and its neighbors across each iteration. Thus, U2GAN can address the weaknesses in the existing models in order to produce plausible node embeddings whose sum is the final embedding of the whole graph. Experimental results show that our unsupervised U2GAN produces new state-of-the-art performances on a range of well-known benchmark datasets for the graph classification task. It even outperforms supervised methods in most of benchmark cases.


Multi-Generator Generative Adversarial Nets

arXiv.org Machine Learning

We propose a new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem. The main intuition is to employ multiple generators, instead of using a single one as in the original GAN. The idea is simple, yet proven to be extremely effective at covering diverse data modes, easily overcoming the mode collapse and delivering state-of-the-art results. A minimax formulation is able to establish among a classifier, a discriminator, and a set of generators in a similar spirit with GAN. Generators create samples that are intended to come from the same distribution as the training data, whilst the discriminator determines whether samples are true data or generated by generators, and the classifier specifies which generator a sample comes from. The distinguishing feature is that internal samples are created from multiple generators, and then one of them will be randomly selected as final output similar to the mechanism of a probabilistic mixture model. We term our method Mixture GAN (MGAN). We develop theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generators' distributions and the empirical data distribution is minimal, whilst the JSD among generators' distributions is maximal, hence effectively avoiding the mode collapse. By utilizing parameter sharing, our proposed model adds minimal computational cost to the standard GAN, and thus can also efficiently scale to large-scale datasets. We conduct extensive experiments on synthetic 2D data and natural image databases (CIFAR-10, STL-10 and ImageNet) to demonstrate the superior performance of our MGAN in achieving state-of-the-art Inception scores over latest baselines, generating diverse and appealing recognizable objects at different resolutions, and specializing in capturing different types of objects by generators.


Analogical-based Bayesian Optimization

arXiv.org Machine Learning

Some real-world problems revolve to solve the optimization problem \max_{x\in\mathcal{X}}f\left(x\right) where f\left(.\right) is a black-box function and X might be the set of non-vectorial objects (e.g., distributions) where we can only define a symmetric and non-negative similarity score on it. This setting requires a novel view for the standard framework of Bayesian Optimization that generalizes the core insightful spirit of this framework. With this spirit, in this paper, we propose Analogical-based Bayesian Optimization that can maximize black-box function over a domain where only a similarity score can be defined. Our pathway is as follows: we first base on the geometric view of Gaussian Processes (GP) to define the concept of influence level that allows us to analytically represent predictive means and variances of GP posteriors and base on that view to enable replacing kernel similarity by a more genetic similarity score. Furthermore, we also propose two strategies to find a batch of query points that can efficiently handle high dimensional data.


Dual Discriminator Generative Adversarial Nets

arXiv.org Machine Learning

We propose in this paper a novel approach to tackle the problem of mode collapse encountered in generative adversarial network (GAN). Our idea is intuitive but proven to be very effective, especially in addressing some key limitations of GAN. In essence, it combines the Kullback-Leibler (KL) and reverse KL divergences into a unified objective function, thus it exploits the complementary statistical properties from these divergences to effectively diversify the estimated density in capturing multi-modes. We term our method dual discriminator generative adversarial nets (D2GAN) which, unlike GAN, has two discriminators; and together with a generator, it also has the analogy of a minimax game, wherein a discriminator rewards high scores for samples from data distribution whilst another discriminator, conversely, favoring data from the generator, and the generator produces data to fool both two discriminators. We develop theoretical analysis to show that, given the maximal discriminators, optimizing the generator of D2GAN reduces to minimizing both KL and reverse KL divergences between data distribution and the distribution induced from the data generated by the generator, hence effectively avoiding the mode collapsing problem. We conduct extensive experiments on synthetic and real-world large-scale datasets (MNIST, CIFAR-10, STL-10, ImageNet), where we have made our best effort to compare our D2GAN with the latest state-of-the-art GAN's variants in comprehensive qualitative and quantitative evaluations. The experimental results demonstrate the competitive and superior performance of our approach in generating good quality and diverse samples over baselines, and the capability of our method to scale up to ImageNet database.


Statistical Latent Space Approach for Mixed Data Modelling and Applications

arXiv.org Machine Learning

The analysis of mixed data has been raising challenges in statistics and machine learning. One of two most prominent challenges is to develop new statistical techniques and methodologies to effectively handle mixed data by making the data less heterogeneous with minimum loss of information. The other challenge is that such methods must be able to apply in large-scale tasks when dealing with huge amount of mixed data. To tackle these challenges, we introduce parameter sharing and balancing extensions to our recent model, the mixed-variate restricted Boltzmann machine (MV.RBM) which can transform heterogeneous data into homogeneous representation. We also integrate structured sparsity and distance metric learning into RBM-based models. Our proposed methods are applied in various applications including latent patient profile modelling in medical data analysis and representation learning for image retrieval. The experimental results demonstrate the models perform better than baseline methods in medical data and outperform state-of-the-art rivals in image dataset.


Geometric Enclosing Networks

arXiv.org Machine Learning

Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometry-based optimization approach to address this problem. Orthogonal to current state-of-the-art density-based approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G\left(\bz\right) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easy-to-control optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and real-world datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multi-modal data and quality of generated data.


Approximation Vector Machines for Large-scale Online Learning

arXiv.org Machine Learning

One of the most challenging problems in kernel online learning is to bound the model size and to promote the model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity, a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage the sparsity and safeguard its risk in compromising the performance. When an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neighbor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis to characterize the gap between the approximation and optimal solutions. This gap crucially depends on the frequency of approximation and the predefined threshold. We perform the convergence analysis for a wide spectrum of loss functions including Hinge, smooth Hinge, and Logistic for classification task, and $l_1$, $l_2$, and $\epsilon$-insensitive for regression task. We conducted extensive experiments for classification task in batch and online modes, and regression task in online mode over several benchmark datasets. The results show that our proposed AVM achieved a comparable predictive performance with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size.