Country
Topological Stability: a New Algorithm for Selecting The Nearest Neighbors in Non-Linear Dimensionality Reduction Techniques
Elhenawy, Mohammed, Masoud, Mahmoud, Glaser, Sebastian, Rakotonirainy, Andry
In the machine learning field, dimensionality reduction is an important task. It mitigates the undesired properties of high-dimensional spaces to facilitate classification, compression, and visualization of high-dimensional data. During the last decade, researchers proposed many new (non-linear) techniques for dimensionality reduction. Most of these techniques are based on the intuition that data lies on or near a complex low-dimensional manifold that is embedded in the high-dimensional space. New techniques for dimensionality reduction aim at identifying and extracting the manifold from the high-dimensional space. Isomap is one of widely-used low-dimensional embedding methods, where geodesic distances on a weighted graph are incorporated with the classical scaling (metric multidimensional scaling). The Isomap chooses the nearest neighbours based on the distance only which causes bridges and topological instability. In this paper, we propose a new algorithm to choose the nearest neighbours to reduce the number of short-circuit errors and hence improves the topological stability. Because at any point on the manifold, that point and its nearest neighbours form a vector subspace and the orthogonal to that subspace is orthogonal to all vectors spans the vector subspace. The prposed algorithmuses the point itself and its two nearest neighbours to find the bases of the subspace and the orthogonal to that subspace which belongs to the orthogonal complementary subspace. The proposed algorithm then adds new points to the two nearest neighbours based on the distance and the angle between each new point and the orthogonal to the subspace. The superior performance of the new algorithm in choosing the nearest neighbours is confirmed through experimental work with several datasets.
Learning to Optimize in Swarms
Cao, Yue, Chen, Tianlong, Wang, Zhangyang, Shen, Yang
Learning to optimize has emerged as a powerful framework for various optimization and machine learning tasks. Current such "meta-optimizers" often learn in the space of continuous optimization algorithms that are point-based and uncertainty-unaware. To overcome the limitations, we propose a meta-optimizer that learns in the algorithmic space of both point-based and population-based optimization algorithms. The meta-optimizer targets at a meta-loss function consisting of both cumulative regret and entropy. Specifically, we learn and interpret the update formula through a population of LSTMs embedded with sample- and feature-level attentions. Meanwhile, we estimate the posterior directly over the global optimum and use an uncertainty measure to help guide the learning process. Empirical results over non-convex test functions and the protein-docking application demonstrate that this new meta-optimizer outperforms existing competitors.
InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions
Vashishth, Shikhar, Sanyal, Soumya, Nitin, Vikram, Agrawal, Nilesh, Talukdar, Partha
Most existing knowledge graphs suffer from incompleteness, which can be alleviated by inferring missing links based on known facts. One popular way to accomplish this is to generate low-dimensional embeddings of entities and relations, and use these to make inferences. ConvE, a recently proposed approach, applies convolutional filters on 2D reshapings of entity and relation embeddings in order to capture rich interactions between their components. However, the number of interactions that ConvE can capture is limited. In this paper, we analyze how increasing the number of these interactions affects link prediction performance, and utilize our observations to propose InteractE. InteractE is based on three key ideas -- feature permutation, a novel feature reshaping, and circular convolution. Through extensive experiments, we find that InteractE outperforms state-of-the-art convolutional link prediction baselines on FB15k-237. Further, InteractE achieves an MRR score that is 9%, 7.5%, and 23% better than ConvE on the FB15k-237, WN18RR and YAGO3-10 datasets respectively. The results validate our central hypothesis -- that increasing feature interaction is beneficial to link prediction performance. We make the source code of InteractE available to encourage reproducible research.
Causality-based Feature Selection: Methods and Evaluations
Yu, Kui, Guo, Xianjie, Liu, Lin, Li, Jiuyong, Wang, Hao, Ling, Zhaolong, Wu, Xindong
Feature selection is a crucial preprocessing step in data analytics and machine learning. Classical feature selection algorithms select features based on the correlations between predictive features and the class variable and do not attempt to capture causal relationships between them. It has been shown that the knowledge about the causal relationships between features and the class variable has potential benefits for building interpretable and robust prediction models, since causal relationships imply the underlying mechanism of a system. Consequently, causality-based feature selection has gradually attracted greater attentions and many algorithms have been proposed. In this paper, we present a comprehensive review of recent advances in causality-based feature selection. To facilitate the development of new algorithms in the research area and make it easy for the comparisons between new methods and existing ones, we develop the first open-source package, called CausalFS, which consists of most of the representative causality-based feature selection algorithms (available at https://github.com/kuiy/CausalFS). Using CausalFS, we conduct extensive experiments to compare the representative algorithms with both synthetic and real-world data sets. Finally, we discuss some challenging problems to be tackled in future causality-based feature selection research.
Working Memory Graphs
Loynd, Ricky, Fernandez, Roland, Celikyilmaz, Asli, Swaminathan, Adith, Hausknecht, Matthew
A BSTRACT Transformers have increasingly outperformed gated RNNs in obtaining new state-of-the-art results on supervised tasks involving text sequences. Inspired by this trend, we study the question of how Transformer-based models can improve the performance of sequential decision-making agents. We present the Working Memory Graph (WMG), an agent that employs multi-head self-attention to reason over a dynamic set of vectors representing observed and recurrent state. We evaluate WMG in two partially observable environments, one that requires complex reasoning over past observations, and another that features factored observations. We find that WMG significantly outperforms gated RNNs on these tasks, supporting the hypothesis that WMG's inductive bias in favor of learning and leveraging factored representations can dramatically boost sample efficiency in environments featuring such structure. In the RNN-based approach of Sutskever et al. (2014), an encoder RNN maps an input sentence to a series of internal hidden state vectors. The encoder's final hidden state is copied into a decoder RNN, which then generates another sequence of hidden states that determine the selection of output tokens in the target language. This model can be trained to translate sentences, but translation quality deteriorates on long sentences where long-term dependencies become critical.
Robust Anomaly Detection and Backdoor Attack Detection Via Differential Privacy
Du, Min, Jia, Ruoxi, Song, Dawn
Outlier detection and novelty detection are two important topics for anomaly detection. Suppose the majority of a dataset are drawn from a certain distribution, outlier detection and novelty detection both aim to detect data samples that do not fit the distribution. Outliers refer to data samples within this dataset, while novelties refer to new samples. In the meantime, backdoor poisoning attacks for machine learning models are achieved through injecting poisoning samples into the training dataset, which could be regarded as "outliers" that are intentionally added by attackers. Differential privacy has been proposed to avoid leaking any individual's information, when aggregated analysis is performed on a given dataset. It is typically achieved by adding random noise, either directly to the input dataset, or to intermediate results of the aggregation mechanism. In this paper, we demonstrate that applying differential privacy can improve the utility of outlier detection and novelty detection, with an extension to detect poisoning samples in backdoor attacks. We first present a theoretical analysis on how differential privacy helps with the detection, and then conduct extensive experiments to validate the effectiveness of differential privacy in improving outlier detection, novelty detection, and backdoor attack detection.
Missingness as Stability: Understanding the Structure of Missingness in Longitudinal EHR data and its Impact on Reinforcement Learning in Healthcare
Fleming, Scott L., Jeyapragasan, Kuhan, Duan, Tony, Ding, Daisy, Gombar, Saurabh, Shah, Nigam, Brunskill, Emma
There is an emerging trend in the reinforcement learning for healthcare literature. In order to prepare longitudinal, irregularly sampled, cli nical datasets for reinforcement learning algorithms, many researchers will resa mple the time series data to short, regular intervals and use last-observation- carried-forward (LOCF) imputation to fill in these gaps. Typically, they will not mai ntain any explicit information about which values were imputed. In this work, w e (1) call attention to this practice and discuss its potential implication s; (2) propose an alternative representation of the patient state that addresses som e of these issues; and (3) demonstrate in a novel but representative clinical data set that our alternative representation yields consistently better results for ach ieving optimal control, as measured by off-policy policy evaluation, compared to repr esentations that do not incorporate missingness information.
Sensory Optimization: Neural Networks as a Model for Understanding and Creating Art
This article is about the cognitive science of visual art. Artists create physical artifacts (such as sculptures or paintings) which depict people, objects, and events. These depictions are usually stylized rather than photo-realistic. How is it that humans are able to understand and create stylized representations? Does this ability depend on general cognitive capacities or an evolutionary adaptation for art? What role is played by learning and culture? Machine Learning can shed light on these questions. It's possible to train convolutional neural networks (CNNs) to recognize objects without training them on any visual art. If such CNNs can generalize to visual art (by creating and understanding stylized representations), then CNNs provide a model for how humans could understand art without innate adaptations or cultural learning. I argue that Deep Dream and Style Transfer show that CNNs can create a basic form of visual art, and that humans could create art by similar processes. This suggests that artists make art by optimizing for effects on the human object-recognition system. Physical artifacts are optimized to evoke real-world objects for this system (e.g. to evoke people or landscapes) and to serve as superstimuli for this system.
Taming Reasoning in Temporal Probabilistic Relational Models
Gehrke, Marcel, Möller, Ralf, Braun, Tanya
Evidence often grounds temporal probabilistic relational models over time, which makes reasoning infeasible. To counteract groundings over time and to keep reasoning polynomial by restoring a lifted representation, we present temporal approximate merging (T AMe), which incorporates (i) clustering for grouping submodels as well as (ii) statistical significance checks to test the fitness of the clustering outcome. In exchange for faster runtimes, T AMe introduces a bounded error that becomes negligible over time. Empirical results show that T AMe significantly improves the runtime performance of inference, while keeping errors small. Introduction Temporal probabilistic relational models express relations between objects, modelling uncertainty as well as temporal aspects. Within one time step, a temporal model is considered static. Performing inference on such models requires algorithms to efficiently handle the temporal aspect to be able to efficiently answer queries. Reasoning in lifted representations has a complexity polynomial in domain sizes. But, models dissolve into ground instances through evidence, which no longer permits reasoning in polynomial time, making query answering infeasible for any reasoning algorithm, exact or approximate. Thus, a key challenge during inference in temporal models is to restore a lifted, i.e., non-grounded, representation. Therefore, we formulate and study the problem of keeping reasoning polynomial (KRP) in temporal models to tame the effect of evidence for efficient query answering. First-order probabilistic inference leverages the relational aspect of a static model, using representatives for groups of indistinguishable, known objects, also known as lifting (Poole 2003). Poole (2003) presents parametric factor graphs as relational models and proposes lifted variable elimination (L VE) as an exact inference algorithm on relational models.
On Value Discrepancy of Imitation Learning
Imitation learning trains a policy from expert demonstrations. Imitation learning approaches have been designed from various principles, such as behavioral cloning via supervised learning, apprenticeship learning via inverse reinforcement learning, and GAIL via generative adversarial learning. In this paper, we propose a framework to analyze the theoretical property of imitation learning approaches based on discrepancy propagation analysis. Under the infinite-horizon setting, the framework leads to the value discrepancy of behavioral cloning in an order of O((1-\gamma)^{-2}). We also show that the framework leads to the value discrepancy of GAIL in an order of O((1-\gamma)^{-1}). It implies that GAIL has less compounding errors than behavioral cloning, which is also verified empirically in this paper. To the best of our knowledge, we are the first one to analyze GAIL's performance theoretically. The above results indicate that the proposed framework is a general tool to analyze imitation learning approaches. We hope our theoretical results can provide insights for future improvements in imitation learning algorithms.