Goto

Collaborating Authors

 Uncertainty


Exact Algorithms for MRE Inference

Journal of Artificial Intelligence Research

Most Relevant Explanation (MRE) is an inference task in Bayesian networks that finds the most relevant partial instantiation of target variables as an explanation for given evidence by maximizing the Generalized Bayes Factor (GBF). No exact MRE algorithm has been developed previously except exhaustive search. This paper fills the void by introducing two Breadth-First Branch-and-Bound (BFBnB) algorithms for solving MRE based on novel upper bounds of GBF. One upper bound is created by decomposing the computation of GBF using a target blanket decomposition of evidence variables. The other upper bound improves the first bound in two ways. One is to split the target blankets that are too large by converting auxiliary nodes into pseudo-targets so as to scale to large problems. The other is to perform summations instead of maximizations on some of the target variables in each target blanket. Our empirical evaluations show that the proposed BFBnB algorithms make exact MRE inference tractable in Bayesian networks that could not be solved previously.


Patterns of Scalable Bayesian Inference

arXiv.org Machine Learning

Datasets are growing not just in size but in complexity, creating a demand for rich models and quantification of uncertainty. Bayesian methods are an excellent fit for this demand, but scaling Bayesian inference is a challenge. In response to this challenge, there has been considerable recent work based on varying assumptions about model structure, underlying computational resources, and the importance of asymptotic correctness. As a result, there is a zoo of ideas with few clear overarching principles. In this paper, we seek to identify unifying principles, patterns, and intuitions for scaling Bayesian inference. We review existing work on utilizing modern computing resources with both MCMC and variational approximation techniques. From this taxonomy of ideas, we characterize the general principles that have proven successful for designing scalable inference procedures and comment on the path forward.


Latent Dirichlet Allocation Using Gibbs Sampling

#artificialintelligence

Text clustering is a widely used techniques to automatically draw out patterns from a set of documents. This notion can be extended to customer segmentation in the digital marketing field. As one of its main core is to understand what drives visitors to come, leave and behave on site. One simple way to do this is by reviewing words that they used to arrive on site and what words they used ( what things they searched) once they're on your site. Another usage of text clustering is for document organization or indexing (tagging).


R Users Will Now Inevitably Become Bayesians

#artificialintelligence

There are several reasons why everyone isn't using Bayesian methods for regression modeling. One reason is that Bayesian modeling requires more thought: you need pesky things like priors, and you can't assume that if a procedure runs without throwing an error that the answers are valid. A second reason is that MCMC sampling -- the bedrock of practical Bayesian modeling -- can be slow compared to closed-form or MLE procedures. A third reason is that existing Bayesian solutions have either been highly-specialized (and thus inflexible), or have required knowing how to use a generalized tool like BUGS, JAGS, or Stan. This third reason has recently been shattered in the R world by not one but two packages: brms and rstanarm.


Variational Autoencoders for Feature Detection of Magnetic Resonance Imaging Data

arXiv.org Machine Learning

Independent component analysis (ICA), as an approach to the blind source-separation (BSS) problem, has become the de-facto standard in many medical imaging settings. Despite successes and a large ongoing research effort, the limitation of ICA to square linear transformations have not been overcome, so that general INFOMAX is still far from being realized. As an alternative, we present feature analysis in medical imaging as a problem solved by Helmholtz machines, which include dimensionality reduction and reconstruction of the raw data under the same objective, and which recently have overcome major difficulties in inference and learning with deep and nonlinear configurations. We demonstrate one approach to training Helmholtz machines, variational auto-encoders (VAE), as a viable approach toward feature extraction with magnetic resonance imaging (MRI) data.


Phase transitions and sample complexity in Bayes-optimal matrix factorization

arXiv.org Machine Learning

We analyse the matrix factorization problem. Given a noisy measurement of a product of two matrices, the problem is to estimate back the original matrices. It arises in many applications such as dictionary learning, blind matrix calibration, sparse principal component analysis, blind source separation, low rank matrix completion, robust principal component analysis or factor analysis. It is also important in machine learning: unsupervised representation learning can often be studied through matrix factorization. We use the tools of statistical mechanics - the cavity and replica methods - to analyze the achievability and computational tractability of the inference problems in the setting of Bayes-optimal inference, which amounts to assuming that the two matrices have random independent elements generated from some known distribution, and this information is available to the inference algorithm. In this setting, we compute the minimal mean-squared-error achievable in principle in any computational time, and the error that can be achieved by an efficient approximate message passing algorithm. The computation is based on the asymptotic state-evolution analysis of the algorithm. The performance that our analysis predicts, both in terms of the achieved mean-squared-error, and in terms of sample complexity, is extremely promising and motivating for a further development of the algorithm.


New Optimisation Methods for Machine Learning

arXiv.org Machine Learning

A thesis submitted for the degree of Doctor of Philosophy of The Australian National University. In this work we introduce several new optimisation methods for problems in machine learning. Our algorithms broadly fall into two categories: optimisation of finite sums and of graph structured objectives. The finite sum problem is simply the minimisation of objective functions that are naturally expressed as a summation over a large number of terms, where each term has a similar or identical weight. Such objectives most often appear in machine learning in the empirical risk minimisation framework in the non-online learning setting. The second category, that of graph structured objectives, consists of objectives that result from applying maximum likelihood to Markov random field models. Unlike the finite sum case, all the non-linearity is contained within a partition function term, which does not readily decompose into a summation. For the finite sum problem, we introduce the Finito and SAGA algorithms, as well as variants of each. For graph-structured problems, we take three complementary approaches. We look at learning the parameters for a fixed structure, learning the structure independently, and learning both simultaneously. Specifically, for the combined approach, we introduce a new method for encouraging graph structures with the "scale-free" property. For the structure learning problem, we establish SHORTCUT, a O(n^{2.5}) expected time approximate structure learning method for Gaussian graphical models. For problems where the structure is known but the parameters unknown, we introduce an approximate maximum likelihood learning algorithm that is capable of learning a useful subclass of Gaussian graphical models.


Towards An Architecture for Representation, Reasoning and Learning in Human-Robot Collaboration

AAAI Conferences

Robots collaborating with humans need to represent knowledge, reason, and learn, at the sensorimotor level and the cognitive level. This paper summarizes the capabilities of an architecture that combines the comple- mentary strengths of declarative programming, proba- bilistic graphical models, and reinforcement learning, to represent, reason with, and learn from, qualitative and quantitative descriptions of incomplete domain knowledge and uncertainty. Representation and reasoning is based on two tightly-coupled domain representations at different resolutions. For any given task, the coarse- resolution symbolic domain representation is translated to an Answer Set Prolog program, which is solved to provide a tentative plan of abstract actions, and to explain unexpected outcomes. Each abstract action is implemented by translating the relevant subset of the corresponding fine-resolution probabilistic representation to a partially observable Markov decision process (POMDP). Any high probability beliefs, obtained by the execution of actions based on the POMDP policy, update the coarse-resolution representation. When incomplete knowledge of the rules governing the domain dynamics results in plan execution not achieving the desired goal, the coarse-resolution and fine-resolution representations are used to formulate the task of incrementally and interactively discovering these rules as a reinforcement learning problem. These capabilities are illustrated in the context of a mobile robot deployed in an indoor office domain.


Accelerating a hybrid continuum-atomistic fluidic model with on-the-fly machine learning

arXiv.org Machine Learning

We present a hybrid continuum-atomistic scheme which combines molecular dynamics (MD) simulations with on-the-fly machine learning techniques for the accurate and efficient prediction of multiscale fluidic systems. By using a Gaussian process as a surrogate model for the computationally expensive MD simulations, we use Bayesian inference to predict the system behaviour at the atomistic scale, purely by consideration of the macroscopic inputs and outputs. Whenever the uncertainty of this prediction is greater than a predetermined acceptable threshold, a new MD simulation is performed to continually augment the database, which is never required to be complete. This provides a substantial enhancement to the current generation of hybrid methods, which often require many similar atomistic simulations to be performed, discarding information after it is used once. We apply our hybrid scheme to nano-confined unsteady flow through a high-aspect-ratio converging-diverging channel, and make comparisons between the new scheme and full MD simulations for a range of uncertainty thresholds and initial databases. For low thresholds, our hybrid solution is highly accurate\,---\,within the thermal noise of a full MD simulation. As the uncertainty threshold is raised, the accuracy of our scheme decreases and the computational speed-up increases (relative to a full MD simulation), enabling the compromise between precision and efficiency to be tuned. The speed-up of our hybrid solution ranges from an order of magnitude, with no initial database, to cases where an extensive initial database ensures no new MD simulations are required.


Sparse Coding with Earth Mover's Distance for Multi-Instance Histogram Representation

arXiv.org Machine Learning

Sparse coding (Sc) has been studied very well as a powerful data representation method. It attempts to represent the feature vector of a data sample by reconstructing it as the sparse linear combination of some basic elements, and a $L_2$ norm distance function is usually used as the loss function for the reconstruction error. In this paper, we investigate using Sc as the representation method within multi-instance learning framework, where a sample is given as a bag of instances, and further represented as a histogram of the quantized instances. We argue that for the data type of histogram, using $L_2$ norm distance is not suitable, and propose to use the earth mover's distance (EMD) instead of $L_2$ norm distance as a measure of the reconstruction error. By minimizing the EMD between the histogram of a sample and the its reconstruction from some basic histograms, a novel sparse coding method is developed, which is refereed as SC-EMD. We evaluate its performances as a histogram representation method in tow multi-instance learning problems --- abnormal image detection in wireless capsule endoscopy videos, and protein binding site retrieval. The encouraging results demonstrate the advantages of the new method over the traditional method using $L_2$ norm distance.