Goto

Collaborating Authors

 Directed Networks


Scalable approximate inference for state space models with normalising flows

arXiv.org Machine Learning

By exploiting mini-batch stochastic gradient optimisation, variational inference has had great success in scaling up approximate Bayesian inference to big data. To date, however, this strategy has only been applicable to models of independent data. Here we extend mini-batch variational methods to state space models of time series data. To do so we introduce a novel generative model as our variational approximation, a local inverse autoregressive flow. This allows a subsequence to be sampled without sampling the entire distribution. Hence we can perform training iterations using short portions of the time series at low computational cost. We illustrate our method on AR(1), Lotka-Volterra and FitzHugh-Nagumo models, achieving accurate parameter estimation in a short time.


Towards Unifying Neural Architecture Space Exploration and Generalization

arXiv.org Machine Learning

In this paper, we address a fundamental research question of significant practical interest: Can certain theoretical characteristics of CNN architectures indicate a priori (i.e., without training) which models with highly different number of parameters and layers achieve a similar generalization performance? To answer this question, we model CNNs from a network science perspective and introduce a new, theoretically-grounded, architecture-level metric called NN-Mass. We also integrate, for the first time, the PAC-Bayes theory of generalization with small-world networks to discover new synergies among our proposed NN-Mass metric, architecture characteristics, and model generalization. With experiments on real datasets such as CIFAR-10/100, we provide extensive empirical evidence for our theoretical findings. Finally, we exploit these new insights for model compression and achieve up to 3x fewer parameters and FLOPS, while losing minimal accuracy (e.g., 96.82% vs. 97%) over large CNNs on the CIFAR-10 dataset.


CMTS: Conditional Multiple Trajectory Synthesizer for Generating Safety-critical Driving Scenarios

arXiv.org Machine Learning

-- Naturalistic driving trajectories are crucial for the performance of autonomous driving algorithms. However, most of the data is collected in safe scenarios leading to the duplication of trajectories which are easy to be handled by currently developed algorithms. When considering safety, testing algorithms in near-miss scenarios that rarely show up in off-the-shelf datasets is a vital part of the evaluation. As a remedy, we propose a near-miss data synthesizing framework based on V ariational Bayesian methods and term it as Conditional Multiple Trajectory Synthesizer (CMTS). We leverage a generative model conditioned on road maps to bridge safe and collision driving data by representing their distribution in the latent space. By sampling from the near-miss distribution, we can synthesize safety-critical data crucial for understanding traffic scenarios but not shown in neither the original dataset nor the collision dataset. Our experimental results demonstrate that the augmented dataset covers more kinds of driving scenarios, especially the near-miss ones, which help improve the trajectory prediction accuracy and the capability of dealing with risky driving scenarios. Data acquisition vehicles are running on roads and different autonomous driving research institutes have already released their datasets containing millions of data [1] [2].


Efficient Local Causal Discovery Based on Markov Blanket

arXiv.org Artificial Intelligence

We study the problem of local causal discovery learning which identifies direct causes and effects of a target variable of interest in a causal network. The existing constraint-based local causal discovery approaches are inefficient, since these approaches do not take a triangular structure formed by a given variable and its child variables into account in learning local causal structure, and hence need to spend much time in distinguishing several direct effects. Additionally, these approaches depend on the standard MB (Markov Blanket) or PC (Parent and Children) discovery algorithms which demand to conduct lots of conditional independence tests to obtain the MB or PC sets. To overcome the above problems, in this paper, we propose a novel Efficient Local Causal Discovery algorithm via MB (ELCD) to identify direct causes and effects of a given variable. More specifically, we design a new algorithm for Efficient Oriented MB discovery, name EOMB. EOMB not only utilizes fewer conditional independence tests to identify MB, but also is able to identify more direct effects of a given variable with the help of triangular causal structures and determine several direct causes as much as possible. In addition, based on the proposed EOMB, ELCD is presented to learn a local causal structure around a target variable. The benefits of ELCD are that it not only can determine the direct causes and effects of a given variable accurately, but also runs faster than other local causal discovery algorithms. Experimental results on eight Bayesian networks (BNs) show that our proposed approach performs better than state-of-the-art baseline methods.


Complete 2019 Data Science & Machine Learning Bootcamp

#artificialintelligence

Welcome to the Complete Data Science and Machine Learning Bootcamp, the only course you need to learn Python and get into data science. At over 35 hours, this Python course is without a doubt the most comprehensive data science and machine learning course available online. Even if you have zero programming experience, this course will take you from beginner to mastery. The course is a taught by the lead instructor at the App Brewery, London's leading in-person programming bootcamp. In the course, you'll be learning the latest tools and technologies that are used by data scientists at Google, Amazon, or Netflix.


Wasserstein Neural Processes

arXiv.org Machine Learning

Neural Processes (NPs) are a class of models that learn a mapping from a context set of input-output pairs to a distribution over functions. They are traditionally trained using maximum likelihood with a KL divergence regularization term. We show that there are desirable classes of problems where NPs, with this loss, fail to learn any reasonable distribution. We also show that this drawback is solved by using approximations of Wasserstein distance which calculates optimal transport distances even for distributions of disjoint support. We give experimental justification for our method and demonstrate performance. These Wasserstein Neural Processes (WNPs) maintain all of the benefits of traditional NPs while being able to approximate a new class of function mappings.


An Efficient Sampling Algorithm for Non-smooth Composite Potentials

arXiv.org Machine Learning

We consider the problem of sampling from a density of the form $p(x) \propto \exp(-f(x)- g(x))$, where $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is a smooth and strongly convex function and $g: \mathbb{R}^d \rightarrow \mathbb{R}$ is a convex and Lipschitz function. We propose a new algorithm based on the Metropolis-Hastings framework, and prove that it mixes to within TV distance $\varepsilon$ of the target density in at most $O(d \log (d/\varepsilon))$ iterations. This guarantee extends previous results on sampling from distributions with smooth log densities ($g = 0$) to the more general composite non-smooth case, with the same mixing time up to a multiple of the condition number. Our method is based on a novel proximal-based proposal distribution that can be efficiently computed for a large class of non-smooth functions $g$.


Entropy Penalty: Towards Generalization Beyond the IID Assumption

arXiv.org Machine Learning

A BSTRACT It has been shown that instead of learning actual object features, deep networks tend to exploit non-robust (spurious) discriminative features that are shared between training and test sets. Therefore, while they achieve state of the art performance on such test sets, they achieve poor generalization on out of distribution (OOD) samples where the IID (independent, identical distribution) assumption breaks and the distribution of non-robust features shifts. Through theoretical and empirical analysis, we show that this happens because maximum likelihood training (without appropriate regularization) leads the model to depend on all the correlations (including spurious ones) present between inputs and targets in the dataset. We then show evidence that the information bottleneck (IB) principle can address this problem. To do so, we propose a regularization approach based on IB, called Entropy Penalty, that reduces the model's dependence on spurious features-features corresponding to such spurious correlations. This allows deep networks trained with Entropy Penalty to generalize well even under distribution shift of spurious features. As a controlled test-bed for evaluating our claim, we train deep networks with Entropy Penalty on a colored MNIST (C-MNIST) dataset and show that it is able to generalize well on vanilla MNIST, MNIST -M and SVHN datasets in addition to an OOD version of C-MNIST itself. The baseline regularization methods we compare against fail to generalize on this test-bed. An example of non-robust feature is the presence of desert in camel images, which may correlate well with this object class. More realistically, models can learn to exploit the abundance of input-target correlations present in datasets, not all of which may be invariant under different environments. Interestingly, such classifiers can achieve good performance on test sets which share the same non-robust features. However, due to this exploitation, these classifiers perform poorly under distribution shift (Geirhos et al., 2018a; Hendrycks & Dietterich, 2019) because it violates the IID assumption which is the foundation of existing generalization theory (Bartlett & Mendelson, 2002; McAllester, 1999b;a).


Tutorial on Implied Posterior Probability for SVMs

arXiv.org Machine Learning

Department of Data Science, Medical Data Science Ltd., Bulgaria Editor: Abstract Implied posterior probability of a given model (say, Support Vector Machines (SVM)) at a point x is an estimate of the class posterior probability pertaining to the class of functions of the model applied to a given dataset. It can be regarded as a score (or estimate) for the true posterior probability, which can then be calibrated/mapped onto expected (non-implied by the model) posterior probability implied by the underlying functions, which have generated the data. In this tutorial we discuss how to compute implied posterior probabilities of SVMs for the binary classification case as well as how to calibrate them via a standard method of isotonic regression. Keywords: Posterior probability, Bayes rule, Classification, SVMs 1. Introduction The implied posterior probability method for estimating class posterior probability has recently been proposed (Nalbantov and Ivanov, 2019). The method provides a score (or estimate) for the true posterior probability, which can then be calibrated/mapped onto expected (non-implied by the model) posterior probability implied by the underlying functions, which have generated the data. The main difference with other methods for solving this problem is the non-reliance on the original model built on the data to estimate posterior probabilities for points which do not belong to the separation surface of the model. Rather, the estimates are based on the class of functions used to build the (original) model, as applied to different versions of the dataset, where the relative weight of the instances varies between the classes. For each such relative weight a different model is built, which is relevant for the estimation of a particular value of the posterior probability.


Localised Generative Flows

arXiv.org Machine Learning

A BSTRACT We argue that flow-based density models based on continuous bijections are limited in their ability to learn target distributions with complicated topologies, and propose localised generative flows (LGFs) to address this problem. LGFs are composed of stacked continuous mixtures of bijections, which enables each bijection to learn a local region of the target rather than its entirety. Our method is a generalisation of existing flow-based methods, which can be used without modification as the basis for an LGF model. Unlike normalising flows, LGFs do not permit exact computation of log likelihoods, but we propose a simple variational scheme that performs well in practice. We show empirically that LGFs yield improved performance across a variety of density estimation tasks. 1 I NTRODUCTION Flow-based generative models, often referred to as normalising flows, have become popular methods for density estimation because of their flexibility, expressiveness, and tractable likelihoods. Given the problem of learning an unknown target density p null X on a data space X, normalising flows model p null X as the marginal of X obtained by the generative process Z p Z, X: g 1 ( Z), (1) where p Z is a prior density on a space Z, and g: X Z is a bijection. The parameters of g can be learned via maximum likelihood given i.i.d. To be effective, a normalising flow model must specify an expressive family of bijections with tractable Jacobians. Affine coupling layers (Dinh et al., 2014; 2016), autoregressive transformations (Germain et al., 2015; Papamakarios et al., 2017), ODEbased transformations (Grathwohl et al., 2018), and invertible ResNet blocks (Behrmann et al., 2019) are all examples of such bijections that can be composed to produce complicated flows. These models have demonstrated significant promise in their ability to model complex datasets (Papamakarios et al., 2017) and to synthesise novel data points (Kingma & Dhariwal, 2018). However, in all these cases, g is continuous in x .