Goto

Collaborating Authors

 Directed Networks


Efficient Bayesian Experimental Design for Implicit Models

arXiv.org Machine Learning

Bayesian experimental design involves the optimal allocation of resources in an experiment, with the aim of optimising cost and performance. For implicit models, where the likelihood is intractable but sampling from the model is possible, this task is particularly difficult and therefore largely unexplored. This is mainly due to technical difficulties associated with approximating posterior distributions and utility functions. We devise a novel experimental design framework for implicit models that improves upon previous work in two ways. First, we use the mutual information between parameters and data as the utility function, which has previously not been feasible. We achieve this by utilising Likelihood-Free Inference by Ratio Estimation (LFIRE) to approximate posterior distributions, instead of the traditional approximate Bayesian computation or synthetic likelihood methods. Secondly, we use Bayesian optimisation in order to solve the optimal design problem, as opposed to the typically used grid search. We find that this increases efficiency and allows us to consider higher design dimensions.


Dynamic Likelihood-free Inference via Ratio Estimation (DIRE)

arXiv.org Machine Learning

Parametric statistical models that are implicitly defined in terms of a stochastic data generating process are used in a wide range of scientific disciplines because they enable accurate modeling. However, learning the parameters from observed data is generally very difficult because their likelihood function is typically intractable. Likelihood-free Bayesian inference methods have been proposed which include the frameworks of approximate Bayesian computation (ABC), synthetic likelihood, and its recent generalization that performs likelihood-free inference by ratio estimation (LFIRE). A major difficulty in all these methods is choosing summary statistics that reduce the dimensionality of the data to facilitate inference. While several methods for choosing summary statistics have been proposed for ABC, the literature for synthetic likelihood and LFIRE is very thin to date. We here address this gap in the literature, focusing on the important special case of time-series models. We show that convolutional neural networks trained to predict the input parameters from the data provide suitable summary statistics for LFIRE. On a wide range of time-series models, a single neural network architecture produced equally or more accurate posteriors than alternative methods.


On PAC-Bayesian Bounds for Random Forests

arXiv.org Machine Learning

Existing guarantees in terms of rigorous upper bounds on the generalization error for the original random forest algorithm, one of the most frequently used machine learning methods, are unsatisfying. We discuss and evaluate various PAC-Bayesian approaches to derive such bounds. The bounds do not require additional hold-out data, because the out-of-bag samples from the bagging in the training process can be exploited. A random forest predicts by taking a majority vote of an ensemble of decision trees. The first approach is to bound the error of the vote by twice the error of the corresponding Gibbs classifier (classifying with a single member of the ensemble selected at random). However, this approach does not take into account the effect of averaging out of errors of individual classifiers when taking the majority vote. This effect provides a significant boost in performance when the errors are independent or negatively correlated, but when the correlations are strong the advantage from taking the majority vote is small. The second approach based on PAC-Bayesian C-bounds takes dependencies between ensemble members into account, but it requires estimating correlations between the errors of the individual classifiers. When the correlations are high or the estimation is poor, the bounds degrade. In our experiments, we compute generalization bounds for random forests on various benchmark data sets. Because the individual decision trees already perform well, their predictions are highly correlated and the C-bounds do not lead to satisfactory results. For the same reason, the bounds based on the analysis of Gibbs classifiers are typically superior and often reasonably tight. Bounds based on a validation set coming at the cost of a smaller training set gave better performance guarantees, but worse performance in most experiments.


Training Dynamic Exponential Family Models with Causal and Lateral Dependencies for Generalized Neuromorphic Computing

arXiv.org Machine Learning

Neuromorphic hardware platforms, such as Intel's Loihi chip, support the implementation of Spiking Neural Networks (SNNs) as an energy-efficient alternative to Artificial Neural Networks (ANNs). SNNs are networks of neurons with internal analogue dynamics that communicate by means of binary time series. In this work, a probabilistic model is introduced for a generalized set-up in which the synaptic time series can take values in an arbitrary alphabet and are characterized by both causal and instantaneous statistical dependencies. The model, which can be considered as an extension of exponential family harmoniums to time series, is introduced by means of a hybrid directed-undirected graphical representation. Furthermore, distributed learning rules are derived for Maximum Likelihood and Bayesian criteria under the assumption of fully observed time series in the training set.


Accelerating Deep Learning with Memcomputing

arXiv.org Artificial Intelligence

Restricted Boltzmann machines (RBMs) and their extensions, called 'deep-belief networks', are powerful neural networks that have found applications in the fields of machine learning and artificial intelligence. The standard way to training these models resorts to an iterative unsupervised procedure based on Gibbs sampling, called 'contrastive divergence' (CD), and additional supervised tuning via back-propagation. However, this procedure has been shown not to follow any gradient and can lead to suboptimal solutions. In this paper, we show an efficient alternative to CD by means of simulations of digital memcomputing machines (DMMs). We test our approach on pattern recognition using a modified version of the MNIST data set. DMMs sample effectively the vast phase space given by the model distribution of the RBM, and provide a very good approximation close to the optimum. This efficient search significantly reduces the number of pretraining iterations necessary to achieve a given level of accuracy, as well as a total performance gain over CD. In fact, the acceleration of pretraining achieved by simulating DMMs is comparable to, in number of iterations, the recently reported hardware application of the quantum annealing method on the same network and data set. Notably, however, DMMs perform far better than the reported quantum annealing results in terms of quality of the training. We also compare our method to advances in supervised training, like batch-normalization and rectifiers, that work to reduce the advantage of pretraining. We find that the memcomputing method still maintains a quality advantage ($>1\%$ in accuracy, and a $20\%$ reduction in error rate) over these approaches. Furthermore, our method is agnostic about the connectivity of the network. Therefore, it can be extended to train full Boltzmann machines, and even deep networks at once.


Stochastic Gradient MCMC for State Space Models

arXiv.org Machine Learning

State space models (SSMs) are a flexible approach to modeling complex time series. However, inference in SSMs is often computationally prohibitive for long time series. Stochastic gradient MCMC (SGMCMC) is a popular method for scalable Bayesian inference for large independent data. Unfortunately when applied to dependent data, such as in SSMs, SGMCMC's stochastic gradient estimates are biased as they break crucial temporal dependencies. To alleviate this, we propose stochastic gradient estimators that control this bias by performing additional computation in a `buffer' to reduce breaking dependencies. Furthermore, we derive error bounds for this bias and show a geometric decay under mild conditions. Using these estimators, we develop novel SGMCMC samplers for discrete, continuous and mixed-type SSMs. Our experiments on real and synthetic data demonstrate the effectiveness of our SGMCMC algorithms compared to batch MCMC, allowing us to scale inference to long time series with millions of time points.


Model Selection Techniques -- An Overview

arXiv.org Machine Learning

Abstract--In the era of "big data", analysts usually explore various statistical models or machine learning methods for observed data in order to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction, and thus central to scientific studies in fields such as ecology, economics, engineering, finance, political science, biology, and epidemiology. There has been a long history of model selection techniques that arise from researches in statistics, information theory, and signal processing. A considerable number of methods have been proposed, following different philosophies and exhibiting varying performances. The purpose of this article is to bring a comprehensive overview of them, in terms of their motivation, large sample performance, and applicability. We provide integrated and practically relevant discussions on theoretical properties of state-ofthe-art model selection approaches. We also share our thoughts on some controversial views on the practice of model selection. Vast development in hardware storage, precision instrument manufacture, economic globalization, etc. have generated huge volumes of data that can be analyzed to extract useful information. Typical statistical inference or machine learning procedures learn from and make predictions on data by fitting parametric or nonparametric models (in a broad sense). However, there exists no model that is universally suitable for any data and goal. This research was funded in part by the Defense Advanced Research Projects Agency (DARPA) under grant number W911NF-18-1-0134. J. Ding and Y. Yang are with the School of Statistics, University of Minnesota, Minneapolis, Minnesota 55455, United States. V. Tarokh is with the Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina 27708, United States. Therefore, a crucial step in a typical data analysis is to consider a set of candidate models (referred to as the model class), and then select the most appropriate one. In other words, model selection is the task of selecting a statistical model from a model class, given a set of data. There have been many overview papers on model selection scattered in the communities of signal processing [1], statistics [2], machine learning [3], epidemiology [4], chemometrics [5], ecology and evolution [6]. Despite the abundant literature on model selection, existing overviews usually focus on derivations, descriptions, or applications of particular model selection principles.


Security Matters: A Survey on Adversarial Machine Learning

arXiv.org Machine Learning

Adversarial machine learning is a fast growing research area, which considers the scenarios when machine learning systems may face potential adversarial attackers, who intentionally synthesize input data to make a well-trained model to make mistake. It always involves a defending side, usually a classifier, and an attacking side that aims to cause incorrect output. The earliest studies on the adversarial examples for machine learning algorithms start from the information security area, which considers a much wider varieties of attacking methods. But recent research focus that popularized by the deep learning community places strong emphasis on how the "imperceivable" perturbations on the normal inputs may cause dramatic mistakes by the deep learning with supposed super-human accuracy. This paper serves to give a comprehensive introduction to a range of aspects of the adversarial deep learning topic, including its foundations, typical attacking and defending strategies, and some extended studies.


Properties of an N Time-Slice Dynamic Chain Event Graph

arXiv.org Machine Learning

A Dynamic Bayesian Network (DBN) [1-3] is a widely used family of graphical model for representing and reasoning within dynamic systems whose progress is recorded over a discrete time intervals [4-10]. However, in some context a DBN model is not able to represent all structural information of the target process [11]. This is particularly the case when the process is more naturally described by concatenations of unfolding events rather than by a product space of preassigned set of random variables. In other situations, a relevant statement corresponding to a conditioned variable cannot be directly incorporated into a DBN model using directed edges because it is valid only for a certain combinations of values assumed by the conditioning variables. In the literature, this type of statements is sometimes referred to context-specific information [12, 13]. To circumvent these issues, collections of networks and embellishments in the form of trees have been added to the DBN framework and computationally implemented using the object-oriented programming paradigm [14]: for instance, see the developments on context-specific BNs [11, 13, 15], Bayesian Multinet [16], Similarity Networks [17] and Object-Oriented BNs [18, 19].


Cost-Sensitive Robustness against Adversarial Examples

arXiv.org Machine Learning

Despite the exceptional performance of deep neural networks (DNNs) on various machine learning tasks such as malware detection (Saxe & Berlin, 2015), face recognition (Parkhi et al., 2015) and autonomous driving (Bojarski et al., 2016), recent studies (Szegedy et al., 2014; Goodfellow et al., 2015) have shown that deep learning models are vulnerable to misclassifying inputs, known as adversarial examples, that are crafted with targeted but visually-imperceptible perturbations. While several defense mechanisms have been proposed and empirically demonstrated to be successful against existing particular attacks (Papernot et al., 2016; Goodfellow et al., 2015), new attacks (Carlini & Wagner, 2017; Tramèr et al., 2017; Athalye et al., 2018) are repeatedly found that circumvent such defenses. To end this arm race, recent works (Wong & Kolter, 2018; Raghunathan et al., 2018; Wong et al., 2018; Wang et al., 2018) propose methods to certify examples to be robust against some specific norm-bounded adversarial perturbations for given inputs and to train models to optimize for certifiable robustness. However, all of the aforementioned methods aim at improving the overall robustness of the classifier. This means that the methods to improve robustness are designed to prevent seed examples in any class from being misclassified as any other classes. Achieving such a goal requires producing a perfect classifier, and has, unsurprisingly, remained elusive. Indeed, Mahloujifar et al. (2018) proved that if the metric probability space is concentrated, overall adversarial robustness is unattainable for any classifier with initial constant error. We argue that overall robustness may not be the appropriate criteria for measuring system performance in securitysensitive applications, since only certain kinds of adversarial misclassifications pose meaningful threats or provide value for adversaries.