Country
Cross-GCN: Enhancing Graph Convolutional Network with $k$-Order Feature Interactions
Feng, Fuli, He, Xiangnan, Zhang, Hanwang, Chua, Tat-Seng
Graph Convolutional Network (GCN) is an emerging technique that performs learning and reasoning on graph data. It operates feature learning on the graph structure, through aggregating the features of the neighbor nodes to obtain the embedding of each target node. Owing to the strong representation power, recent research shows that GCN achieves state-of-the-art performance on several tasks such as recommendation and linked document classification. Despite its effectiveness, we argue that existing designs of GCN forgo modeling cross features, making GCN less effective for tasks or data where cross features are important. Although neural network can approximate any continuous function, including the multiplication operator for modeling feature crosses, it can be rather inefficient to do so (i.e., wasting many parameters at the risk of overfitting) if there is no explicit design. To this end, we design a new operator named Cross-feature Graph Convolution, which explicitly models the arbitrary-order cross features with complexity linear to feature dimension and order size. We term our proposed architecture as Cross-GCN, and conduct experiments on three graphs to validate its effectiveness. Extensive analysis validates the utility of explicitly modeling cross features in GCN, especially for feature learning at lower layers.
Spherical Principal Curves
Kim, Jang-Hyun, Lee, Jongmin, Oh, Hee-Seok
This paper presents a new approach for dimension reduction of data observed in a sphere. Several dimension reduction techniques have recently developed for the analysis of non-Euclidean data. As a pioneer work, Hauberg (2016) attempted to implement principal curves on Riemannian manifolds. However, this approach uses approximations to deal with data on Riemannian manifolds, which causes distorted results. In this study, we propose a new approach to construct principal curves on a sphere by a projection of the data onto a continuous curve. Our approach lies in the same line of Hastie and Stuetzle (1989) that proposed principal curves for Euclidean space data. We further investigate the stationarity of the proposed principal curves that satisfy the self-consistency on a sphere. Results from real data analysis with earthquake data and simulation examples demonstrate the promising empirical properties of the proposed approach.
Permute to Train: A New Dimension to Training Deep Neural Networks
We show that Deep Neural Networks (DNNs) can be efficiently trained by permuting neuron connections. We introduce a new family of methods to train DNNs called Permute to Train (P2T). Two implementations of P2T are presented: Stochastic Gradient Permutation and Lookahead Permutation. The former computes permutation based on gradient, and the latter depends on another optimizer to derive the permutation. We empirically show that our proposed method, despite only swapping randomly weighted connections, achieves comparable accuracy to that of Adam on MNIST, Fashion-MNIST, and CIFAR-10 datasets. It opens up possibilities for new ways to train and regularize DNNs.
Adaptive Prediction Timing for Electronic Health Records
Deasy, Jacob, Ercole, Ari, Liò, Pietro
In realistic scenarios, multivariate timeseries evolve over case-by-case time-scales. This is particularly clear in medicine, where the rate of clinical events varies by ward, patient, and application. Increasingly complex models have been shown to effectively predict patient outcomes, but have failed to adapt granularity to these inherent temporal resolutions. As such, we introduce a novel, more realistic, approach to generating patient outcome predictions at an adaptive rate based on uncertainty accumulation in Bayesian recurrent models. We use a Recurrent Neural Network (RNN) and a Bayesian embedding layer with a new aggregation method to demonstrate adaptive prediction timing. Our model predicts more frequently when events are dense or the model is certain of event latent representations, and less frequently when readings are sparse or the model is uncertain. At 48 hours after patient admission, our model achieves equal performance compared to its static-windowed counterparts, while generating patient- and event-specific prediction timings that lead to improved predictive performance over the crucial first 12 hours of the patient stay.
On the performance of deep learning models for time series classification in streaming
Lara-Benítez, Pedro, Carranza-García, Manuel, Martínez-Álvarez, Francisco, Riquelme, José C.
Processing data streams arriving at high speed requires the development of models that can provide fast and accurate predictions. Although deep neural networks are the state-of-the-art for many machine learning tasks, their performance in real-time data streaming scenarios is a research area that has not yet been fully addressed. Nevertheless, there have been recent efforts to adapt complex deep learning models for streaming tasks by reducing their processing rate. The design of the asynchronous dual-pipeline deep learning framework allows to predict over incoming instances and update the model simultaneously using two separate layers. The aim of this work is to assess the performance of different types of deep architectures for data streaming classification using this framework. We evaluate models such as multi-layer perceptrons, recurrent, convolutional and temporal convolutional neural networks over several time-series datasets that are simulated as streams. The obtained results indicate that convolutional architectures achieve a higher performance in terms of accuracy and efficiency.
Bayesian Domain Randomization for Sim-to-Real Transfer
Muratore, Fabio, Eilers, Christian, Gienger, Michael, Peters, Jan
When learning policies for robot control, the real-world data required is typically prohibitively expensive to acquire, so learning in simulation is a popular strategy. Unfortunately, such polices are often not transferable to the real world due to a mismatch between the simulation and reality, called 'reality gap'. Domain randomization methods tackle this problem by randomizing the physics simulator (source domain) according to a distribution over domain parameters during training in order to obtain more robust policies that are able to overcome the reality gap. Most domain randomization approaches sample the domain parameters from a fixed distribution. This solution is suboptimal in the context of sim-to-real transferability, since it yields policies that have been trained without explicitly optimizing for the reward on the real system (target domain). Additionally, a fixed distribution assumes there is prior knowledge about the uncertainty over the domain parameters. Thus, we propose Bayesian Domain Randomization (BayRn), a black box sim-to-real algorithm that solves tasks efficiently by adapting the domain parameter distribution during learning by sampling the real-world target domain. BayRn utilizes Bayesian optimization to search the space of source domain distribution parameters which produce a policy that maximizes the real-word objective, allowing for adaptive distributions during policy optimization. We experimentally validate the proposed approach by comparing against two baseline methods on a nonlinear under-actuated swing-up task. Our results show that BayRn is capable to perform direct sim-to-real transfer, while significantly reducing the required prior knowledge.
Adversarial Robustness Through Local Lipschitzness
Yang, Yao-Yuan, Rashtchian, Cyrus, Zhang, Hongyang, Salakhutdinov, Ruslan, Chaudhuri, Kamalika
A standard method for improving the robustness of neural networks is adversarial training, where the network is trained on adversarial examples that are close to the training inputs. This produces classifiers that are robust, but it often decreases clean accuracy. Prior work even posits that the tradeoff between robustness and accuracy may be inevitable. We investigate this tradeoff in more depth through the lens of local Lipschitzness. In many image datasets, the classes are separated in the sense that images with different labels are not extremely close in $\ell_\infty$ distance. Using this separation as a starting point, we argue that it is possible to achieve both accuracy and robustness by encouraging the classifier to be locally smooth around the data. More precisely, we consider classifiers that are obtained by rounding locally Lipschitz functions. Theoretically, we show that such classifiers exist for any dataset such that there is a positive distance between the support of different classes. Empirically, we compare the local Lipschitzness of classifiers trained by several methods. Our results show that having a small Lipschitz constant correlates with achieving high clean and robust accuracy, and therefore, the smoothness of the classifier is an important property to consider in the context of adversarial examples. Code available at https://github.com/yangarbiter/robust-local-lipschitz .
PAC-Bayesian Meta-learning with Implicit Prior
Nguyen, Cuong, Do, Thanh-Toan, Carneiro, Gustavo
We introduce a new and rigorously-formulated PAC-Bayes few-shot meta-learning algorithm that implicitly learns a prior distribution of the model of interest. Our proposed method extends the PAC-Bayes framework from a single task setting to the few-shot learning setting to upper-bound generalisation errors on unseen tasks and samples. We also propose a generative-based approach to model the shared prior and the posterior of task-specific model parameters more expressively compared to the usual diagonal Gaussian assumption. We show that the models trained with our proposed meta-learning algorithm are well calibrated and accurate, with state-of-the-art calibration and classification results on few-shot classification (mini-ImageNet and tiered-ImageNet) and regression (multi-modal task-distribution regression) benchmarks.
Semi-supervised Learning Meets Factorization: Learning to Recommend with Chain Graph Model
Chen, Chaochao, Chang, Kevin C., Li, Qibing, Zheng, Xiaolin
Recently latent factor model (LFM) has been drawing much attention in recommender systems due to its good performance and scalability. However, existing LFMs predict missing values in a user-item rating matrix only based on the known ones, and thus the sparsity of the rating matrix always limits their performance. Meanwhile, semi-supervised learning (SSL) provides an effective way to alleviate the label (i.e., rating) sparsity problem by performing label propagation, which is mainly based on the smoothness insight on affinity graphs. However, graph-based SSL suffers serious scalability and graph unreliable problems when directly being applied to do recommendation. In this paper, we propose a novel probabilistic chain graph model (CGM) to marry SSL with LFM. The proposed CGM is a combination of Bayesian network and Markov random field. The Bayesian network is used to model the rating generation and regression procedures, and the Markov random field is used to model the confidence-aware smoothness constraint between the generated ratings. Experimental results show that our proposed CGM significantly outperforms the state-of-the-art approaches in terms of four evaluation metrics, and with a larger performance margin when data sparsity increases.
TIME: A Transparent, Interpretable, Model-Adaptive and Explainable Neural Network for Dynamic Physical Processes
Singh, Gurpreet, Gupta, Soumyajit, Lease, Matt, Dawson, Clint N.
Partial Differential Equations are infinite dimensional encoded representations of physical processes. However, imbibing multiple observation data towards a coupled representation presents significant challenges. We present a fully convolutional architecture that captures the invariant structure of the domain to reconstruct the observable system. The proposed architecture is significantly low-weight compared to other networks for such problems. Our intent is to learn coupled dynamic processes interpreted as deviations from true kernels representing isolated processes for model-adaptivity. Experimental analysis shows that our architecture is robust and transparent in capturing process kernels and system anomalies. We also show that high weights representation is not only redundant but also impacts network interpretability. Our design is guided by domain knowledge, with isolated process representations serving as ground truths for verification. These allow us to identify redundant kernels and their manifestations in activation maps to guide better designs that are both interpretable and explainable unlike traditional deep-nets.