Goto

Collaborating Authors

 Banff


On the Lattice of Conceptual Measurements

arXiv.org Artificial Intelligence

Beyond that, almost every data set is further scaled prior to (data)processing to meet the requirements of the employed data analysis method, such as the introduction of artificial metrics, the numerical representation of nominal features, etc. This scaling is usually accompanied by a grade of detail, which in turn is becoming more and more of a problem for data science tasks as the availability of features increases and their human explainability decreases. Often used methods to deal with this problem from the field of machine learning, such as principal component analysis, do enforce particular, possible inapt, levels of measurement, e.g., food tastes represented by real numbers, and amplify the problem for explainability. Therefore, understanding the set of possible scaling maps, identifying its (algebraic) properties, and deriving to some extent human explainable control over it, is a pressing problem. This is especially important since found patterns and dependencies may be artifacts of some scaling map and may therefore corrupt any subsequent task,e.g., classification tasks.


Driving Behavior Explanation with Multi-level Fusion

arXiv.org Artificial Intelligence

In this era of active development of autonomous vehicles, it becomes crucial to provide driving systems with the capacity to explain their decisions. In this work, we focus on generating high-level driving explanations as the vehicle drives. We present BEEF, for BEhavior Explanation with Fusion, a deep architecture which explains the behavior of a trajectory prediction model. Supervised by annotations of human driving decisions justifications, BEEF learns to fuse features from multiple levels. Leveraging recent advances in the multi-modal fusion literature, BEEF is carefully designed to model the correlations between high-level decisions features and mid-level perceptual features. The flexibility and efficiency of our approach are validated with extensive experiments on the HDD and BDD-X datasets.


Molecule Optimization via Fragment-based Generative Models

arXiv.org Machine Learning

In drug discovery, molecule optimization is an important step in order to modify drug candidates into better ones in terms of desired drug properties. With the recent advance of Artificial Intelligence, this traditionally in vitro process has been increasingly facilitated by in silico approaches. We present an innovative in silico approach to computationally optimizing molecules and formulate the problem as to generate optimized molecular graphs via deep generative models. Our generative models follow the key idea of fragment-based drug design, and optimize molecules by modifying their small fragments. Our models learn how to identify the to-be-optimized fragments and how to modify such fragments by learning from the difference of molecules that have good and bad properties. In optimizing a new molecule, our models apply the learned signals to decode optimized fragments at the predicted location of the fragments. We also construct multiple such models into a pipeline such that each of the models in the pipeline is able to optimize one fragment, and thus the entire pipeline is able to modify multiple fragments of molecule if needed. We compare our models with other state-of-the-art methods on benchmark datasets and demonstrate that our methods significantly outperform others with more than 80% property improvement under moderate molecular similarity constraints, and more than 10% property improvement under high molecular similarity constraints.


Variational Autoencoders for Learning Nonlinear Dynamics of Physical Systems

arXiv.org Artificial Intelligence

We develop data-driven methods for incorporating physical information for priors to learn parsimonious representations of nonlinear systems arising from parameterized PDEs and mechanics. Our approach is based on Variational Autoencoders (VAEs) for learning from observations nonlinear state space models. We develop ways to incorporate geometric and topological priors through general manifold latent space representations. We investigate the performance of our methods for learning low dimensional representations for the nonlinear Burgers equation and constrained mechanical systems.


A PAC-Bayesian Perspective on Structured Prediction with Implicit Loss Embeddings

arXiv.org Machine Learning

Many practical machine learning tasks can be framed as Structured prediction problems, where several output variables are predicted and considered interdependent. Recent theoretical advances in structured prediction have focused on obtaining fast rates convergence guarantees, especially in the Implicit Loss Embedding (ILE) framework. PAC-Bayes has gained interest recently for its capacity of producing tight risk bounds for predictor distributions. This work proposes a novel PAC-Bayes perspective on the ILE Structured prediction framework. We present two generalization bounds, on the risk and excess risk, which yield insights into the behavior of ILE predictors. Two learning algorithms are derived from these bounds.


Autoencoding Variational Autoencoder

arXiv.org Machine Learning

Does a Variational AutoEncoder (VAE) consistently encode typical samples generated from its decoder? This paper shows that the perhaps surprising answer to this question is `No'; a (nominally trained) VAE does not necessarily amortize inference for typical samples that it is capable of generating. We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency. Our approach hinges on an alternative construction of the variational approximation distribution to the true posterior of an extended VAE model with a Markov chain alternating between the encoder and the decoder. The method can be used to train a VAE model from scratch or given an already trained VAE, it can be run as a post processing step in an entirely self supervised way without access to the original training data. Our experimental analysis reveals that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks. We provide experimental results on the ColorMnist and CelebA benchmark datasets that quantify the properties of the learned representations and compare the approach with a baseline that is specifically trained for the desired property.


DiffPrune: Neural Network Pruning with Deterministic Approximate Binary Gates and $L_0$ Regularization

arXiv.org Machine Learning

Modern neural network architectures typically have many millions of parameters and can be pruned significantly without substantial loss in effectiveness which demonstrates they are over-parameterized. The contribution of this work is two-fold. The first is a method for approximating a multivariate Bernoulli random variable by means of a deterministic and differentiable transformation of any real-valued multivariate random variable. The second is a method for model selection by element-wise multiplication of parameters with approximate binary gates that may be computed deterministically or stochastically and take on exact zero values. Sparsity is encouraged by the inclusion of a surrogate regularization to the $L_0$ loss. Since the method is differentiable it enables straightforward and efficient learning of model architectures by an empirical risk minimization procedure with stochastic gradient descent and theoretically enables conditional computation during training. The method also supports any arbitrary group sparsity over parameters or activations and therefore offers a framework for unstructured or flexible structured model pruning. To conclude experiments are performed to demonstrate the effectiveness of the proposed approach.


Planning from Pixels using Inverse Dynamics Models

arXiv.org Artificial Intelligence

Learning task-agnostic dynamics models in high-dimensional observation spaces can be challenging for model-based RL agents. We propose a novel way to learn latent world models by learning to predict sequences of future actions conditioned on task completion. These task-conditioned models adaptively focus modeling capacity on task-relevant dynamics, while simultaneously serving as an effective heuristic for planning with sparse rewards. We evaluate our method on challenging visual goal completion tasks and show a substantial increase in performance compared to prior model-free approaches. Deep reinforcement learning has proven to be a powerful and effective framework for solving a diversity of challenging decision-making problems (Silver et al., 2017a; Berner et al., 2019). However these algorithms are typically trained to maximize a single reward function, ignoring information that is not directly relevant to the associated task at hand. This way of learning is in stark contrast to how humans learn (Tenenbaum, 2018). Without being prompted by a specific task, humans can still explore their environment, practice achieving imaginary goals, and in so doing learn about the dynamics of the environment. When subsequently presented with a novel task, humans can utilize this learned knowledge to bootstrap learning -- a property we would like our artificial agents to have. In this work, we investigate one way to bridge this gap by learning world models (Ha & Schmidhuber, 2018) that enable the realization of previously unseen tasks. By modeling the task-agnostic dynamics of an environment, an agent can make predictions about how its own actions may affect the environment state without the need for additional samples from the environment. Prior work has shown that by using powerful function approximators to model environment dynamics, training an agent entirely within its own world models can result in large gains in sample efficiency (Ha & Schmidhuber, 2018).


Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification

arXiv.org Machine Learning

Bayesian neural networks (BNNs) have been long considered an ideal, yet unscalable solution for improving the robustness and the predictive uncertainty of deep neural networks. While they could capture more accurately the posterior distribution of the network parameters, most BNN approaches are either limited to small networks or rely on constraining assumptions such as parameter independence. These drawbacks have enabled prominence of simple, but computationally heavy approaches such as Deep Ensembles, whose training and testing costs increase linearly with the number of networks. In this work we aim for efficient deep BNNs amenable to complex computer vision architectures, e.g. ResNet50 DeepLabV3+, and tasks, e.g. semantic segmentation, with fewer assumptions on the parameters. We achieve this by leveraging variational autoencoders (VAEs) to learn the interaction and the latent distribution of the parameters at each network layer. Our approach, Latent-Posterior BNN (LP-BNN), is compatible with the recent BatchEnsemble method, leading to highly efficient ({in terms of computation and} memory during both training and testing) ensembles. LP-BNN s attain competitive results across multiple metrics in several challenging benchmarks for image classification, semantic segmentation and out-of-distribution detection.


Research Progress of News Recommendation Methods

arXiv.org Artificial Intelligence

Due to researchers'aim to study personalized recommendations for different business fields, the summary of recommendation methods in specific fields is of practical significance. News recommendation systems were the earliest research field regarding recommendation systems, and were also the earliest recommendation field to apply the collaborative filtering method. In addition, news is real-time and rich in content, which makes news recommendation methods more challenging than in other fields. Thus, this paper summarizes the research progress regarding news recommendation methods. From 2018 to 2020, developed news recommendation methods were mainly deep learning-based, attention-based, and knowledge graphs-based. As of 2020, there are many news recommendation methods that combine attention mechanisms and knowledge graphs. However, these methods were all developed based on basic methods (the collaborative filtering method, the content-based recommendation method, and a mixed recommendation method combining the two). In order to allow researchers to have a detailed understanding of the development process of news recommendation methods, the news recommendation methods surveyed in this paper, which cover nearly 10 years, are divided into three categories according to the abovementioned basic methods. Firstly, the paper introduces the basic ideas of each category of methods and then summarizes the recommendation methods that are combined with other methods based on each category of methods and according to the time sequence of research results. Finally, this paper also summarizes the challenges confronting news recommendation systems.