Goto

Collaborating Authors

 Bayesian Inference


Input Warping for Bayesian Optimization of Non-stationary Functions

arXiv.org Machine Learning

Bayesian optimization has proven to be a highly effective methodology for the global optimization of unknown, expensive and multimodal functions. The ability to accurately model distributions over functions is critical to the effectiveness of Bayesian optimization. Although Gaussian processes provide a flexible prior over functions which can be queried efficiently, there are various classes of functions that remain difficult to model. One of the most frequently occurring of these is the class of non-stationary functions. The optimization of the hyperparameters of machine learning algorithms is a problem domain in which parameters are often manually transformed a priori, for example by optimizing in "log-space," to mitigate the effects of spatially-varying length scale. We develop a methodology for automatically learning a wide family of bijective transformations or warpings of the input space using the Beta cumulative distribution function. We further extend the warping framework to multi-task Bayesian optimization so that multiple tasks can be warped into a jointly stationary space. On a set of challenging benchmark optimization tasks, we observe that the inclusion of warping greatly improves on the state-of-the-art, producing better results faster and more reliably.


Learning Latent Variable Gaussian Graphical Models

arXiv.org Machine Learning

Gaussian graphical models (GGM) have been widely used in many high-dimensional applications ranging from biological and financial data to recommender systems. Sparsity in GGM plays a central role both statistically and computationally. Unfortunately, real-world data often does not fit well to sparse graphical models. In this paper, we focus on a family of latent variable Gaussian graphical models (LVGGM), where the model is conditionally sparse given latent variables, but marginally non-sparse. In LVGGM, the inverse covariance matrix has a low-rank plus sparse structure, and can be learned in a regularized maximum likelihood framework. We derive novel parameter estimation error bounds for LVGGM under mild conditions in the high-dimensional setting. These results complement the existing theory on the structural learning, and open up new possibilities of using LVGGM for statistical inference.


Predictive Entropy Search for Efficient Global Optimization of Black-box Functions

arXiv.org Machine Learning

We propose a novel information-theoretic approach for Bayesian optimization called Predictive Entropy Search (PES). At each iteration, PES selects the next evaluation point that maximizes the expected information gained with respect to the global maximum. PES codifies this intractable acquisition function in terms of the expected reduction in the differential entropy of the predictive distribution. This reformulation allows PES to obtain approximations that are both more accurate and efficient than other alternatives such as Entropy Search (ES). Furthermore, PES can easily perform a fully Bayesian treatment of the model hyperparameters while ES cannot. We evaluate PES in both synthetic and real-world applications, including optimization problems in machine learning, finance, biotechnology, and robotics. We show that the increased accuracy of PES leads to significant gains in optimization performance.


Bayesian calibration for forensic evidence reporting

arXiv.org Machine Learning

We introduce a Bayesian solution for the problem in forensic speaker recognition, where there may be very little background material for estimating score calibration parameters. We work within the Bayesian paradigm of evidence reporting and develop a principled probabilistic treatment of the problem, which results in a Bayesian likelihood-ratio as the vehicle for reporting weight of evidence. We show in contrast, that reporting a likelihood-ratio distribution does not solve this problem. Our solution is experimentally exercised on a simulated forensic scenario, using NIST SRE'12 scores, which demonstrates a clear advantage for the proposed method compared to the traditional plugin calibration recipe.


Augur: a Modeling Language for Data-Parallel Probabilistic Inference

arXiv.org Artificial Intelligence

It is time-consuming and error-prone to implement inference procedures for each new probabilistic model. Probabilistic programming addresses this problem by allowing a user to specify the model and having a compiler automatically generate an inference procedure for it. For this approach to be practical, it is important to generate inference code that has reasonable performance. In this paper, we present a probabilistic programming language and compiler for Bayesian networks designed to make effective use of data-parallel architectures such as GPUs. Our language is fully integrated within the Scala programming language and benefits from tools such as IDE support, type-checking, and code completion. We show that the compiler can generate data-parallel inference code scalable to thousands of GPU cores by making use of the conditional independence relationships in the Bayesian network.


Learning directed acyclic graphs via bootstrap aggregating

arXiv.org Machine Learning

Probabilistic graphical models are graphical representations of probability distributions. Graphical models have applications in many fields including biology, social sciences, linguistic, neuroscience. In this paper, we propose directed acyclic graphs (DAGs) learning via bootstrap aggregating. The proposed procedure is named as DAGBag. Specifically, an ensemble of DAGs is first learned based on bootstrap resamples of the data and then an aggregated DAG is derived by minimizing the overall distance to the entire ensemble. A family of metrics based on the structural hamming distance is defined for the space of DAGs (of a given node set) and is used for aggregation. Under the high-dimensional-low-sample size setting, the graph learned on one data set often has excessive number of false positive edges due to over-fitting of the noise. Aggregation overcomes over-fitting through variance reduction and thus greatly reduces false positives. We also develop an efficient implementation of the hill climbing search algorithm of DAG learning which makes the proposed method computationally competitive for the high-dimensional regime. The DAGBag procedure is implemented in the R package dagbag.


ExpertBayes: Automatically refining manually built Bayesian networks

arXiv.org Machine Learning

Bayesian network structures are usually built using only the data and starting from an empty network or from a naive Bayes structure. Very often, in some domains, like medicine, a prior structure knowledge is already known. This structure can be automatically or manually refined in search for better performance models. In this work, we take Bayesian networks built by specialists and show that minor perturbations to this original network can yield better classifiers with a very small computational cost, while maintaining most of the intended meaning of the original model.


Compressed Gaussian Process

arXiv.org Machine Learning

Nonparametric regression for massive numbers of samples (n) and features (p) is an increasingly important problem. In big n settings, a common strategy is to partition the feature space, and then separately apply simple models to each partition set. We propose an alternative approach, which avoids such partitioning and the associated sensitivity to neighborhood choice and distance metrics, by using random compression combined with Gaussian process regression. The proposed approach is particularly motivated by the setting in which the response is conditionally independent of the features given the projection to a low dimensional manifold. Conditionally on the random compression matrix and a smoothness parameter, the posterior distribution for the regression surface and posterior predictive distributions are available analytically. Running the analysis in parallel for many random compression matrices and smoothness parameters, model averaging is used to combine the results. The algorithm can be implemented rapidly even in very big n and p problems, has strong theoretical justification, and is found to yield state of the art predictive performance.


Learning Latent Block Structure in Weighted Networks

arXiv.org Machine Learning

Networks are an increasingly important form of structured data consisting of interactions between pairs of individuals in large social and biological data sets. Unlike attribute data where each observation is associated with an individual, network data is represented by graphs, where individuals are vertices and interactions are edges. Because vertices are pairwise related, network data violates traditional assumptions of attribute data, such as independence. This intrinsic difference in structure prompts the development of new tools for handling network data. In social and biological networks, vertices often play distinct structural roles in generating the network's large-scale structure. To identify such latent structural roles, we aim to identify a network partition that groups together vertices with similar group-level connectivity patterns. We call these groups "communities," and their inference produces a compact description of the large-scale 1 (a) Assortative (b) Disassortative (c) Core-Periphery (d) Ordered Figure 1: Examples of structure that can be learned using the SBM. The first row shows the abstract connections between four groups (blue, red, green, and purple). The second row shows the'block' structure found in the adjacency matrix after sorting by group membership; black corresponds to edges and white corresponds to non-edges.


Topological and Statistical Behavior Classifiers for Tracking Applications

arXiv.org Machine Learning

We introduce the first unified theory for target tracking using Multiple Hypothesis Tracking, Topological Data Analysis, and machine learning. Our string of innovations are 1) robust topological features are used to encode behavioral information, 2) statistical models are fitted to distributions over these topological features, and 3) the target type classification methods of Wigren and Bar Shalom et al. are employed to exploit the resulting likelihoods for topological features inside of the tracking procedure. To demonstrate the efficacy of our approach, we test our procedure on synthetic vehicular data generated by the Simulation of Urban Mobility package.