Goto

Collaborating Authors

 Bayesian Inference


Scalable Natural Gradient Langevin Dynamics in Practice

arXiv.org Machine Learning

Stochastic Gradient Langevin Dynamics (SGLD) is a sampling scheme for Bayesian modeling adapted to large datasets and models. SGLD relies on the injection of Gaussian Noise at each step of a Stochastic Gradient Descent (SGD) update. In this scheme, every component in the noise vector is independent and has the same scale, whereas the parameters we seek to estimate exhibit strong variations in scale and significant correlation structures, leading to poor convergence and mixing times. We compare different preconditioning approaches to the normalization of the noise vector and benchmark these approaches on the following criteria: 1) mixing times of the multivariate parameter vector, 2) regularizing effect on small dataset where it is easy to overfit, 3) covariate shift detection and 4) resistance to adversarial examples.


Probabilistic Model-Agnostic Meta-Learning

arXiv.org Machine Learning

Meta-learning for few-shot learning entails acquiring a prior over previous tasks and experiences, such that new tasks be learned from small amounts of data. However, a critical challenge in few-shot learning is task ambiguity: even when a powerful prior can be meta-learned from a large number of prior tasks, a small dataset for a new task can simply be too ambiguous to acquire a single model (e.g., a classifier) for that task that is accurate. In this paper, we propose a probabilistic meta-learning algorithm that can sample models for a new task from a model distribution. Our approach extends model-agnostic meta-learning, which adapts to new tasks via gradient descent, to incorporate a parameter distribution that is trained via a variational lower bound. At meta-test time, our algorithm adapts via a simple procedure that injects noise into gradient descent, and at meta-training time, the model is trained such that this stochastic adaptation procedure produces samples from the approximate model posterior. Our experimental results show that our method can sample plausible classifiers and regressors in ambiguous few-shot learning problems.


MEBN-RM: A Mapping between Multi-Entity Bayesian Network and Relational Model

arXiv.org Machine Learning

Multi-Entity Bayesian Network (MEBN) is a knowledge representation formalism combining Bayesian Networks (BN) with First-Order Logic (FOL). MEBN has sufficient expressive power for general-purpose knowledge representation and reasoning. Developing a MEBN model to support a given application is a challenge, requiring definition of entities, relationships, random variables, conditional dependence relationships, and probability distributions. When available, data can be invaluable both to improve performance and to streamline development. By far the most common format for available data is the relational database (RDB). Relational databases describe and organize data according to the Relational Model (RM). Developing a MEBN model from data stored in an RDB therefore requires mapping between the two formalisms. This paper presents MEBN-RM, a set of mapping rules between key elements of MEBN and RM. We identify links between the two languages (RM and MEBN) and define four levels of mapping from elements of RM to elements of MEBN. These definitions are implemented in the MEBN-RM algorithm, which converts a relational schema in RM to a partial MEBN model. Through this research, the software has been released as a MEBN-RM open-source software tool. The method is illustrated through two example use cases using MEBN-RM to develop MEBN models: a Critical Infrastructure Defense System and a Smart Manufacturing System.


Robust and Scalable Models of Microbiome Dynamics

arXiv.org Machine Learning

Microbes are everywhere, including in and on our bodies, and have been shown to play key roles in a variety of prevalent human diseases. Consequently, there has been intense interest in the design of bacteriotherapies or "bugs as drugs," which are communities of bacteria administered to patients for specific therapeutic applications. Central to the design of such therapeutics is an understanding of the causal microbial interaction network and the population dynamics of the organisms. In this work we present a Bayesian nonparametric model and associated efficient inference algorithm that addresses the key conceptual and practical challenges of learning microbial dynamics from time series microbe abundance data. These challenges include high-dimensional (300+ strains of bacteria in the gut) but temporally sparse and non-uniformly sampled data; high measurement noise; and, nonlinear and physically non-negative dynamics. Our contributions include a new type of dynamical systems model for microbial dynamics based on what we term interaction modules, or learned clusters of latent variables with redundant interaction structure (reducing the expected number of interaction coefficients from $O(n^2)$ to $O((\log n)^2)$); a fully Bayesian formulation of the stochastic dynamical systems model that propagates measurement and latent state uncertainty throughout the model; and introduction of a temporally varying auxiliary variable technique to enable efficient inference by relaxing the hard non-negativity constraint on states. We apply our method to simulated and real data, and demonstrate the utility of our technique for system identification from limited data and gaining new biological insights into bacteriotherapy design.


Stochastic Block Models are a Discrete Surface Tension

arXiv.org Machine Learning

Networks, which represent agents and interactions between them, arise in myriad applications throughout the sciences, engineering, and even the humanities. To understand large-scale structure in a network, a common task is to cluster a network's nodes into sets called "communities" such that there are dense connections within communities but sparse connections between them. A popular and statistically principled method to perform such clustering is to use a family of generative models known as stochastic block models (SBMs). In this paper, we show that maximum likelihood estimation in an SBM is a network analog of a well-known continuum surface-tension problem that arises from an application in metallurgy. To illustrate the utility of this bridge, we implement network analogs of three surface-tension algorithms, with which we successfully recover planted community structure in synthetic networks and which yield fascinating insights on empirical networks from the field of hyperspectral video segmentation.


Human-aided Multi-Entity Bayesian Networks Learning from Relational Data

arXiv.org Machine Learning

An Artificial Intelligence (AI) system is an autonomous system which emulates human mental and physical activities such as Observe, Orient, Decide, and Act, called the OODA process. An AI system performing the OODA process requires a semantically rich representation to handle a complex real world situation and ability to reason under uncertainty about the situation. Multi-Entity Bayesian Networks (MEBNs) combines First-Order Logic with Bayesian Networks for representing and reasoning about uncertainty in complex, knowledge-rich domains. MEBN goes beyond standard Bayesian networks to enable reasoning about an unknown number of entities interacting with each other in various types of relationships, a key requirement for the OODA process of an AI system. MEBN models have heretofore been constructed manually by a domain expert. However, manual MEBN modeling is labor-intensive and insufficiently agile. To address these problems, an efficient method is needed for MEBN modeling. One of the methods is to use machine learning to learn a MEBN model in whole or in part from data. In the era of Big Data, data-rich environments, characterized by uncertainty and complexity, have become ubiquitous. The larger the data sample is, the more accurate the results of the machine learning approach can be. Therefore, machine learning has potential to improve the quality of MEBN models as well as the effectiveness for MEBN modeling. In this research, we study a MEBN learning framework to develop a MEBN model from a combination of domain expert's knowledge and data. To evaluate the MEBN learning framework, we conduct an experiment to compare the MEBN learning framework and the existing manual MEBN modeling in terms of development efficiency.


Gaussian Mixture Reduction for Time-Constrained Approximate Inference in Hybrid Bayesian Networks

arXiv.org Artificial Intelligence

Hybrid Bayesian Networks (HBNs), which contain both discrete and continuous variables, arise naturally in many application areas (e.g., image understanding, data fusion, medical diagnosis, fraud detection). This paper concerns inference in an important subclass of HBNs, the conditional Gaussian (CG) networks, in which all continuous random variables have Gaussian distributions and all children of continuous random variables must be continuous. Inference in CG networks can be NP-hard even for special-case structures, such as poly-trees, where inference in discrete Bayesian networks can be performed in polynomial time. Therefore, approximate inference is required. In approximate inference, it is often necessary to trade off accuracy against solution time. This paper presents an extension to the Hybrid Message Passing inference algorithm for general CG networks and an algorithm for optimizing its accuracy given a bound on computation time. The extended algorithm uses Gaussian mixture reduction to prevent an exponential increase in the number of Gaussian mixture components. The trade-off algorithm performs pre-processing to find optimal run-time settings for the extended algorithm. Experimental results for four CG networks compare performance of the extended algorithm with existing algorithms and show the optimal settings for these CG networks.


Model-free, Model-based, and General Intelligence

arXiv.org Artificial Intelligence

During the 60s and 70s, AI researchers explored intuitions about intelligence by writing programs that displayed intelligent behavior. Many good ideas came out from this work but programs written by hand were not robust or general. After the 80s, research increasingly shifted to the development of learners capable of inferring behavior and functions from experience and data, and solvers capable of tackling well-defined but intractable models like SAT, classical planning, Bayesian networks, and POMDPs. The learning approach has achieved considerable success but results in black boxes that do not have the flexibility, transparency, and generality of their model-based counterparts. Model-based approaches, on the other hand, require models and scalable algorithms. Model-free learners and model-based solvers have close parallels with Systems 1 and 2 in current theories of the human mind: the first, a fast, opaque, and inflexible intuitive mind; the second, a slow, transparent, and flexible analytical mind. In this paper, I review developments in AI and draw on these theories to discuss the gap between model-free learners and model-based solvers, a gap that needs to be bridged in order to have intelligent systems that are robust and general.


Constrained Counting and Sampling: Bridging the Gap between Theory and Practice

arXiv.org Artificial Intelligence

Constrained counting and sampling are two fundamental problems in Computer Science with numerous applications, including network reliability, privacy, probabilistic reasoning, and constrained-random verification. In constrained counting, the task is to compute the total weight, subject to a given weighting function, of the set of solutions of the given constraints. In constrained sampling, the task is to sample randomly, subject to a given weighting function, from the set of solutions to a set of given constraints. Consequently, constrained counting and sampling have been subject to intense theoretical and empirical investigations over the years. Prior work, however, offered either heuristic techniques with poor guarantees of accuracy or approaches with proven guarantees but poor performance in practice. In this thesis, we introduce a novel hashing-based algorithmic framework for constrained sampling and counting that combines the classical algorithmic technique of universal hashing with the dramatic progress made in combinatorial reasoning tools, in particular, SAT and SMT, over the past two decades. The resulting frameworks for counting (ApproxMC2) and sampling (UniGen) can handle formulas with up to million variables representing a significant boost up from the prior state of the art tools' capability to handle few hundreds of variables. If the initial set of constraints is expressed as Disjunctive Normal Form (DNF), ApproxMC is the only known Fully Polynomial Randomized Approximation Scheme (FPRAS) that does not involve Monte Carlo steps. By exploiting the connection between definability of formulas and variance of the distribution of solutions in a cell defined by 3-universal hash functions, we introduced an algorithmic technique, MIS, that reduced the size of XOR constraints employed in the underlying universal hash functions by as much as two orders of magnitude.


Conditional probability calculation using restricted Boltzmann machine with application to system identification

arXiv.org Machine Learning

There are many advantages to use probability method for nonlinear system identification, such as the noises and outliers in the data set do not affect the probability models significantly; the input features can be extracted in probability forms. The biggest obstacle of the probability model is the probability distributions are not easy to be obtained. In this paper, we form the nonlinear system identification into solving the conditional probability. Then we modify the restricted Boltzmann machine (RBM), such that the joint probability, input distribution, and the conditional probability can be calculated by the RBM training. Binary encoding and continue valued methods are discussed. The universal approximation analysis for the conditional probability based modelling is proposed. We use two benchmark nonlinear systems to compare our probability modelling method with the other black-box modeling methods. The results show that this novel method is much better when there are big noises and the system dynamics are complex.