Goto

Collaborating Authors

 Uncertainty


Stochastic Gradient Hamiltonian Monte Carlo

arXiv.org Machine Learning

Hamiltonian Monte Carlo (HMC) sampling methods provide a mechanism for defining distant proposals with high acceptance probabilities in a Metropolis-Hastings framework, enabling more efficient exploration of the state space than standard random-walk proposals. The popularity of such methods has grown significantly in recent years. However, a limitation of HMC methods is the required gradient computation for simulation of the Hamiltonian dynamical system-such computation is infeasible in problems involving a large sample size or streaming data. Instead, we must rely on a noisy gradient estimate computed from a subset of the data. In this paper, we explore the properties of such a stochastic gradient HMC approach. Surprisingly, the natural implementation of the stochastic approximation can be arbitrarily bad. To address this problem we introduce a variant that uses second-order Langevin dynamics with a friction term that counteracts the effects of the noisy gradient, maintaining the desired target distribution as the invariant distribution. Results on simulated data validate our theory. We also provide an application of our methods to a classification task using neural networks and to online Bayesian matrix factorization.


Learning modular structures from network data and node variables

arXiv.org Machine Learning

A standard technique for understanding underlying dependency structures among a set of variables posits a shared conditional probability distribution for the variables measured on individuals within a group. This approach is often referred to as module networks, where individuals are represented by nodes in a network, groups are termed modules, and the focus is on estimating the network structure among modules. However, estimation solely from node-specific variables can lead to spurious dependencies, and unverifiable structural assumptions are often used for regularization. Here, we propose an extended model that leverages direct observations about the network in addition to node-specific variables. By integrating complementary data types, we avoid the need for structural assumptions. We illustrate theoretical and practical significance of the model and develop a reversible-jump MCMC learning procedure for learning modules and model parameters. We demonstrate the method accuracy in predicting modular structures from synthetic data and capability to learn influence structures in twitter data and regulatory modules in the Mycobacterium tuberculosis gene regulatory network.


A Hybrid Monte Carlo Architecture for Parameter Optimization

arXiv.org Machine Learning

Much recent research has been conducted in the area of Bayesian learning, particularly with regard to the optimization of hyper-parameters via Gaussian process regression. The methodologies rely chiefly on the method of maximizing the expected improvement of a score function with respect to adjustments in the hyper-parameters. In this work, we present a novel algorithm that exploits notions of confidence intervals and uncertainties to enable the discovery of the best optimal within a targeted region of the parameter space. We demonstrate the efficacy of our algorithm with respect to machine learning problems and show cases where our algorithm is competitive with the method of maximizing expected improvement.


Reproducing kernel Hilbert space based estimation of systems of ordinary differential equations

arXiv.org Machine Learning

Nonlinear systems of differential equations have attracted the interest in fields like system biology, ecology or biochemistry, due to their flexibility and their ability to describe dynamical systems. Despite the importance of such models in many branches of science they have not been the focus of systematic statistical analysis until recently. In this work we propose a general approach to estimate the parameters of systems of differential equations measured with noise. Our methodology is based on the maximization of the penalized likelihood where the system of differential equations is used as a penalty. To do so, we use a Reproducing Kernel Hilbert Space approach that allows us to formulate the estimation problem as an unconstrained numeric maximization problem easy to solve. The proposed method is tested with synthetically simulated data and it is used to estimate the unobserved transcription factor CdaR in Steptomyes coelicolor using gene expression data of the genes it regulates. Keywords: System of ordinary differential equations, differential operator, reproducing kernel Hilbert space, gene regulatory network 1. Introduction Despite the fact that differential equations are a common modelling tool within science and engineering, statistical methods for estimating such models have only received widespread attention during the last few years. The difficulty of solving differential equations in general has been a major stumbling block for efficient statistical procedures.


Factored Performance Functions with Structural Representation in Continuous Time Bayesian Networks

AAAI Conferences

The continuous time Bayesian network (CTBN) is a probabilistic graphical model that enables reasoning about complex, interdependent, and continuous-time subsystems. The model uses nodes to denote subsystems and arcs to denote conditional dependence. This dependence manifests in how the dynamics of a subsystem change based on the current states of its parents in the network. While the original CTBN definition allows users to specify the dynamics of how the system evolves, users might also want to place value expressions over the dynamics of the model in the form of performance functions. We formalize these performance functions for the CTBN and show how they can be factored in the same way as the network, allowing what we argue is a more intuitive and explicit representation. For cases in which a performance function must involve multiple nodes, we show how to augment the structure of the CTBN to account for the performance interaction while maintaining the factorization of a single performance function for each node.


An Empirical Evaluation of Costs and Benefits of Simplifying Bayesian Networks by Removing Weak Arcs

AAAI Conferences

We report the results of an empirical evaluation of structural simplification of Bayesian networks by removing weak arcs. We conduct a series of experiments on six networks built from real data sets selected from the UC Irvine Machine Learning Repository. We systematically remove arcs from the weakest to the strongest, relying on four measures of arc strength, and measure the classification accuracy of the resulting simplified models. Our results show that removing up to roughly 20 percent of the weakest arcs in a network has minimal effect on its classification accuracy. At the same time, structural simplification of networks leads to significant reduction of both the amount of memory taken by the clique tree and the amount of computation needed to perform inference.


A Novel Methodology for Processing Probabilistic Knowledge Bases Under Maximum Entropy

AAAI Conferences

Probabilistic reasoning under the so-called principle of maximum entropy is a viable and convenient alternative to Bayesian networks, relieving the user from providing complete (local) probabilistic information and observing rigorous conditional independence assumptions. In this paper, we present a novel approach to performing computational MaxEnt reasoning that makes use of symbolic computations instead of graph-based techniques. Given a probabilistic knowledge base, we encode the MaxEnt optimization problem into a system of polynomial equations, and then apply Grรถbner basis theory to find MaxEnt inferences as solutions to the polynomials. We illustrate our approach with an example of a knowledge base that represents findings on fraud detection in enterprises.


Special Track on Uncertain Reasoning

AAAI Conferences

This meeting at FLAIRS-27 will mark the nineteenth in the series. Like the previous tracks, the special track seeks to bring together researchers working on broad issues related to reasoning under uncertainty. Papers on all aspects of uncertain reasoning were invited.


A Hybrid Fuzzy-Firefly Approach for Rule-Based Classification

AAAI Conferences

Pattern classification algorithms have been applied in data mining and signal processing to extract the knowledge from data in a wide range of applications. The Fuzzy inference systems have successfully been used to extract rules in rule-based applications. In this paper, a novel hybrid methodology using: (i) fuzzy logic (in form of if-then rules) and (ii) a bio-inspired optimization technique (firefly algorithm) is proposed to improve performance and accuracy of classification task. Experiments are done using nine standard data sets in UCI machine learning repository. The results show that overall the accuracy and performance of our classification are better or very competitive compared to others reported in literature.


Why (and When and How) Contrastive Divergence Works

arXiv.org Machine Learning

Contrastive divergence (CD) is a promising method of inference in high dimensional distributions with intractable normalizing constants, however, the theoretical foundations justifying its use are somewhat shaky. This document proposes a framework for understanding CD inference, how/when it works, and provides multiple justifications for the CD moment conditions, including framing them as a variational approximation. Algorithms for performing inference are discussed and are applied to social network data using an exponential-family random graph models (ERGM). The framework also provides guidance about how to construct MCMC kernels providing good CD inference, which turn out to be quite different from those used typically to provide fast global mixing.