Country
AdaCare: Explainable Clinical Health Status Representation Learning via Scale-Adaptive Feature Extraction and Recalibration
Ma, Liantao, Gao, Junyi, Wang, Yasha, Zhang, Chaohe, Wang, Jiangtao, Ruan, Wenjie, Tang, Wen, Gao, Xin, Ma, Xinyu
Deep learning-based health status representation learning and clinical prediction have raised much research interest in recent years. Existing models have shown superior performance, but there are still several major issues that have not been fully taken into consideration. First, the historical variation pattern of the biomarker in diverse time scales plays a vital role in indicating the health status, but it has not been explicitly extracted by existing works. Second, key factors that strongly indicate the health risk are different among patients. It is still challenging to adaptively make use of the features for patients in diverse conditions. Third, using prediction models as the black box will limit the reliability in clinical practice. However, none of the existing works can provide satisfying interpretability and meanwhile achieve high prediction performance. In this work, we develop a general health status representation learning model, named AdaCare. It can capture the long and short-term variations of biomarkers as clinical features to depict the health status in multiple time scales. It also models the correlation between clinical features to enhance the ones which strongly indicate the health status and thus can maintain a state-of-the-art performance in terms of prediction accuracy while providing qualitative interpretability. We conduct a health risk prediction experiment on two real-world datasets. Experiment results indicate that AdaCare outperforms state-of-the-art approaches and provides effective interpretability, which is verifiable by clinical experts.
The Nonstochastic Control Problem
Hazan, Elad, Kakade, Sham M., Singh, Karan
We consider the problem of controlling an unknown linear dynamical system with non-stochastic adversarial perturbations and adversarial convex loss functions. The possibility of sub-linear regret was posed as an open question by [Tu 2019]. We answer this question in the affirmative by giving an efficient algorithm that guarantees a regret of T^(2/3). Crucial for our result is a system identification procedure that is provably effective under adversarial perturbations.
Adaptive Initialization Method for K-means Algorithm
Yang, Jie, Wang, Yu-Kai, Yao, Xin, Lin, Chin-Teng
The K-means algorithm is a widely used clustering algorithm that offers simplicity and efficiency. However, the traditional K-means algorithm uses the random method to determine the initial cluster centers, which make clustering results prone to local optima and then result in worse clustering performance. Many initialization methods have been proposed, but none of them can dynamically adapt to datasets with various characteristics. In our previous research, an initialization method for K-means based on hybrid distance was proposed, and this algorithm can adapt to datasets with different characteristics. However, it has the following drawbacks: (a) When calculating density, the threshold cannot be uniquely determined, resulting in unstable results. (b) Heavily depending on adjusting the parameter, the parameter must be adjusted five times to obtain better clustering results. (c) The time complexity of the algorithm is quadratic, which is difficult to apply to large datasets. In the current paper, we proposed an adaptive initialization method for the K-means algorithm (AIMK) to improve our previous work. AIMK can not only adapt to datasets with various characteristics but also obtain better clustering results within two interactions. In addition, we then leverage random sampling in AIMK, which is named as AIMK-RS, to reduce the time complexity. AIMK-RS is easily applied to large and high-dimensional datasets. We compared AIMK and AIMK-RS with 10 different algorithms on 16 normal and six extra-large datasets. The experimental results show that AIMK and AIMK-RS outperform the current initialization methods and several well-known clustering algorithms. Furthermore, AIMK-RS can significantly reduce the complexity of applying it to extra-large datasets with high dimensions. The time complexity of AIMK-RS is O(n).
Multi-Range Attentive Bicomponent Graph Convolutional Network for Traffic Forecasting
Chen, Weiqi, Chen, Ling, Xie, Yu, Cao, Wei, Gao, Yusong, Feng, Xiaojie
Traffic forecasting is of great importance to transportation management and public safety, and very challenging due to the complicated spatial - temporal dependency and essential uncertainty brought about by the road network and traffic conditions. Latest studies ma inly focus on modeling the spatial dependency by utilizing graph convolutional networks (GCNs) throughout a fixed weighted graph. However, edges, i.e., the correlations between pair - wise nodes, are much more complicated and interact with each other. In thi s paper, we propose the M ulti - R ange A ttentive B icomponent GCN (MRA - BGCN), a novel deep learning model for traffic forecasting. We first build the node - wise graph according to the road network distance and the edge - wise graph according to various edge inter action patterns . Then, we implement the interactions of both nodes and edges using bicom-ponent graph convolution. T he multi - range attention mechanism is introduced to aggregate information in different neighbor hood range s and automatically learn the import ance of different ranges. Extensive experiments on two real - world road network traffic datasets, METR - LA and PEMS - BAY, show that our MRA - BGCN achieves the state - of - the - art results.
A tale of two toolkits, report the second: bake off redux. Chapter 1. dictionary based classifiers
Bagnall, Anthony, Large, James, Middlehurst, Matthew
Time series classification (TSC) is the problem of learning labels from time dependent data. One class of algorithms is derived from a bag of words approach. A window is run along a series, the subseries is shortened and discretised to form a word, then features are formed from the histogram of frequency of occurrence of words. We call this type of approach to TSC dictionary based classification. We compare four dictionary based algorithms in the context of a wider project to update the great time series classification bakeoff, a comparative study published in 2017. We experimentally characterise the algorithms in terms of predictive performance, time complexity and space complexity. We find that we can improve on the previous best in terms of accuracy, but this comes at the cost of time and space. Alternatively, the same performance can be achieved with far less cost. We review the relative merits of the four algorithms before suggesting a path to possible improvement.
SecureGBM: Secure Multi-Party Gradient Boosting
Fengy, Zhi, Xiong, Haoyi, Song, Chuanyuan, Yang, Sijia, Zhao, Baoxin, Wang, Licheng, Chen, Zeyu, Yang, Shengwen, Liu, Liping, Huan, Jun
Federated machine learning systems have been widely used to facilitate the joint data analytics across the distributed datasets owned by the different parties that do not trust each others. In this paper, we proposed a novel Gradient Boosting Machines (GBM) framework SecureGBM built-up with a multi-party computation model based on semi-homomorphic encryption, where every involved party can jointly obtain a shared Gradient Boosting machines model while protecting their own data from the potential privacy leakage and inferential identification. More specific, our work focused on a specific "dual--party" secure learning scenario based on two parties -- both party own an unique view (i.e., attributes or features) to the sample group of samples while only one party owns the labels. In such scenario, feature and label data are not allowed to share with others. To achieve the above goal, we firstly extent -- LightGBM -- a well known implementation of tree-based GBM through covering its key operations for training and inference with SEAL homomorphic encryption schemes. However, the performance of such re-implementation is significantly bottle-necked by the explosive inflation of the communication payloads, based on ciphertexts subject to the increasing length of plaintexts. In this way, we then proposed to use stochastic approximation techniques to reduced the communication payloads while accelerating the overall training procedure in a statistical manner. Our experiments using the real-world data showed that SecureGBM can well secure the communication and computation of LightGBM training and inference procedures for the both parties while only losing less than 3% AUC, using the same number of iterations for gradient boosting, on a wide range of benchmark datasets.
Composition operators on reproducing kernel Hilbert spaces with analytic positive definite functions
Ikeda, Masahiro, Ishikawa, Isao, Sawano, Yoshihiro
Composition operators have been extensively studied in complex analysis, and recently, they have been utilized in engineering and machine learning. Here, we focus on composition operators associated with maps in Euclidean spaces that are on reproducing kernel Hilbert spaces with respect to analytic positive definite functions, and prove the maps are affine if the composition operators are bounded. Our result covers composition operators on Paley-Wiener spaces and reproducing kernel spaces with respect to the Gaussian kernel on ${\mathbb R}^d$, widely used in the context of engineering.
GRIm-RePR: Prioritising Generating Important Features for Pseudo-Rehearsal
Atkinson, Craig, McCane, Brendan, Szymanski, Lech, Robins, Anthony
Pseudo-rehearsal allows neural networks to learn a sequence of tasks without forgetting how to perform in earlier tasks. Preventing forgetting is achieved by introducing a generative network which can produce data from previously seen tasks so that it can be rehearsed along side learning the new task. This has been found to be effective in both supervised and reinforcement learning. Our current work aims to further prevent forgetting by encouraging the generator to accurately generate features important for task retention. More specifically, the generator is improved by introducing a second discriminator into the Generative Adversarial Network which learns to classify between real and fake items from the intermediate activation patterns that they produce when fed through a continual learning agent. Using Atari 2600 games, we experimentally find that improving the generator can considerably reduce catastrophic forgetting compared to the standard pseudo-rehearsal methods used in deep reinforcement learning. Furthermore, we propose normalising the Q-values taught to the long-term system as we observe this substantially reduces catastrophic forgetting by minimising the interference between tasks' reward functions.
SAG-VAE: End-to-end Joint Inference of Data Representations and Feature Relations
Wang, Chen, Deng, Chengyuan, Ivanov, Vladimir
Variational Autoencoders (VAEs) are powerful in data representation inference, but it cannot learn relations between features with its vanilla form and common variations. The ability to capture relations within data can provide the much needed inductive bias necessary for building more robust Machine Learning algorithms with more interpretable results. In this paper, inspired by recent advances in relational learning using Graph Neural Networks, we propose the \textbf{S}elf-\textbf{A}ttention \textbf{G}raph \textbf{V}ariational \textbf{A}uto\textbf{E}ncoder (SAG-VAE) network which can simultaneously learn feature relations and data representations in an end-to-end manner. SAG-VAE is trained by jointly inferring the posterior distribution of two types of latent variables, which denote the data representation and a shared graph structure, respectively. Furthermore, we introduce a novel self-attention graph network that improves the generative capabilities of SAG-VAE by parameterizing the generative distribution allowing SAG-VAE to generate new data via graph convolution, while still trainable via backpropagation. A learnable relational graph representation enhances SAG-VAE's robustness to perturbation and noise, while also providing deeper intuition into model performance. Experiments based on graphs show that SAG-VAE is capable of approximately retrieving edges and links between nodes based entirely on feature observations. Finally, results on image data illustrate that SAG-VAE is fairly robust against perturbations in image reconstruction and sampling.
Benefits of Jointly Training Autoencoders: An Improved Neural Tangent Kernel Analysis
Nguyen, Thanh V., Wong, Raymond K. W., Hegde, Chinmay
A remarkable recent discovery in machine learning has been that deep neural networks can achieve impressive performance (in terms of both lower training error and higher generalization capacity) in the regime where they are massively over-parameterized. Consequently, over the last several months, the community has devoted growing interest in analyzing optimization and generalization properties of over-parameterized networks, and several breakthrough works have led to important theoretical progress. However, the majority of existing work only applies to supervised learning scenarios and hence are limited to settings such as classification and regression. In contrast, the role of over-parameterization in the unsupervised setting has gained far less attention. In this paper, we study the gradient dynamics of two-layer over-parameterized autoencoders with ReLU activation. We make very few assumptions about the given training dataset (other than mild non-degeneracy conditions). Starting from a randomly initialized autoencoder network, we rigorously prove the linear convergence of gradient descent in two learning regimes, namely: (i) the weakly-trained regime where only the encoder is trained, and (ii) the jointly-trained regime where both the encoder and the decoder are trained. Our results indicate the considerable benefits of joint training over weak training for finding global optima, achieving a dramatic decrease in the required level of over-parameterization. We also analyze the case of weight-tied autoencoders (which is a commonly used architectural choice in practical settings) and prove that in the over-parameterized setting, training such networks from randomly initialized points leads to certain unexpected degeneracies.