Goto

Collaborating Authors

 Uncertainty


Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data

arXiv.org Machine Learning

Multivariate categorical data occur in many applications of machine learning. One of the main difficulties with these vectors of categorical variables is sparsity. The number of possible observations grows exponentially with vector length, but dataset diversity might be poor in comparison. Recent models have gained significant improvement in supervised tasks with this data. These models embed observations in a continuous space to capture similarities between them. Building on these ideas we propose a Bayesian model for the unsupervised task of distribution estimation of multivariate categorical data. We model vectors of categorical variables as generated from a non-linear transformation of a continuous latent space. Non-linearity captures multi-modality in the distribution. The continuous representation addresses sparsity. Our model ties together many existing models, linking the linear categorical latent Gaussian model, the Gaussian process latent variable model, and Gaussian process classification. We derive inference for our model based on recent developments in sampling based variational inference. We show empirically that the model outperforms its linear and discrete counterparts in imputation tasks of sparse data.


The Informed Sampler: A Discriminative Approach to Bayesian Inference in Generative Computer Vision Models

arXiv.org Machine Learning

Computer vision is hard because of a large variability in lighting, shape, and texture; in addition the image signal is non-additive due to occlusion. Generative models promised to account for this variability by accurately modelling the image formation process as a function of latent variables with prior beliefs. Bayesian posterior inference could then, in principle, explain the observation. While intuitively appealing, generative models for computer vision have largely failed to deliver on that promise due to the difficulty of posterior inference. As a result the community has favoured efficient discriminative approaches. We still believe in the usefulness of generative models in computer vision, but argue that we need to leverage existing discriminative or even heuristic computer vision methods. We implement this idea in a principled way with an "informed sampler" and in careful experiments demonstrate it on challenging generative models which contain renderer programs as their components. We concentrate on the problem of inverting an existing graphics rendering engine, an approach that can be understood as "Inverse Graphics". The informed sampler, using simple discriminative proposals based on existing computer vision technology, achieves significant improvements of inference.


Parallel Gaussian Process Regression for Big Data: Low-Rank Representation Meets Markov Approximation

AAAI Conferences

The expressive power of a Gaussian process (GP) model comes at a cost of poor scalability in the data size. To improve its scalability, this paper presents a low-rank-cum-Markov approximation (LMA) of the GP model that is novel in leveraging the dual computational advantages stemming from complementing a low-rank approximate representation of the full-rank GP based on a support set of inputs with a Markov approximation of the resulting residual process; the latter approximation is guaranteed to be closest in the Kullback-Leibler distance criterion subject to some constraint and is considerably more refined than that of existing sparse GP models utilizing low-rank representations due to its more relaxed conditional independence assumption (especially with larger data). As a result, our LMA method can trade off between the size of the support set and the order of the Markov property to (a) incur lower computational cost than such sparse GP models while achieving predictive performance comparable to them and (b) accurately represent features/patterns of any scale. Interestingly, varying the Markov order produces a spectrum of LMAs with PIC approximation and full-rank GP at the two extremes. An advantage of our LMA method is that it is amenable to parallelization on multiple machines/cores, thereby gaining greater scalability. Empirical evaluation on three real-world datasets in clusters of up to 32 computing nodes shows that our centralized and parallel LMA methods are significantly more time-efficient and scalable than state-of-the-art sparse and full-rank GP regression methods while achieving comparable predictive performances.


Learning to Reject Sequential Importance Steps for Continuous-Time Bayesian Networks

AAAI Conferences

Applications of graphical models often require the use of approximate inference, such as sequential importance sampling (SIS), for estimation of the model distribution given partial evidence, i.e., the target distribution. However, when SIS proposal and target distributions are dissimilar, such procedures lead to biased estimates or require a prohibitive number of samples. We introduce ReBaSIS, a method that better approximates the target distribution by sampling variable by variable from existing importance samplers and accepting or rejecting each proposed assignment in the sequence: a choice made based on anticipating upcoming evidence. We relate the per-variable proposal and model distributions by expected weight ratios of sequence completions and show that we can learn accurate models of optimal acceptance probabilities from local samples. In a continuous-time domain, our method improves upon previous importance samplers by transforming an SIS problem into a machine learning one.


Loss-Calibrated Monte Carlo Action Selection

AAAI Conferences

Bayesian decision-theory underpins robust decision-making in applications ranging from plant control to robotics where hedging action selection against state uncertainty is critical for minimizing low probability but potentially catastrophic outcomes (e.g, uncontrollable plant conditions or robots falling into stairwells). Unfortunately, belief state distributions in such settings are often complex and/or high dimensional, thus prohibiting the efficient application of analytical techniques for expected utility computation when real-time control is required. This leaves Monte Carlo evaluation as one of the few viable (and hence frequently used) techniques for online action selection. However, loss-insensitive Monte Carlo methods may require large numbers of samples to identify optimal actions with high certainty since they may sample from highprobability regions that do not disambiguate action utilities. In this paper we remedy this problem by deriving an optimal proposal distribution for a loss-calibrated Monte Carlo importance sampler that bounds the regret of using an estimated optimal action. Empirically, we show that using our loss-calibrated Monte Carlo method yields high-accuracy optimal action selections in a fraction of the number of samples required by conventional loss-insensitive samplers.


Automated Analysis of Commitment Protocols Using Probabilistic Model Checking

AAAI Conferences

Commitment protocols provide an effective formalism for the regulation of agent interaction. Although existing work mainly focus on the design-time development of static commitment protocols, recent studies propose methods to create them dynamically at run-time with respect to the goals of the agents. These methods require agents to verify new commitment protocols taking their goals, and beliefs about the other agentsโ€™ behavior into account. Accordingly, in this paper, we first propose a probabilistic model to formally capture commitment protocols according to agentsโ€™ beliefs. Secondly, we identify a set of important properties for the verification of a new commitment protocol from an agentโ€™s perspective and formalize these properties in our model. Thirdly, we develop probabilistic model checking algorithms with advanced reduction for efficient verification of these properties. Finally, we implement these algorithms as a tool and evaluate the proposed properties over different commitment protocols.


Bayesian Networks Specified Using Propositional and Relational Constructs: Combined, Data, and Domain Complexity

AAAI Conferences

We examine the inferential complexity of Bayesian networks specified through logical constructs. We first consider simple propositional languages, and then move to relational languages. We examine both the combined complexity of inference (as network size and evidence size are not bounded) and the data complexity of inference (where network size is bounded); we also examine the connection to liftability through domain complexity. Combined and data complexity of several inference problems are presented, ranging from polynomial to exponential classes.


Bayesian Model Averaging Naive Bayes (BMA-NB): Averaging over an Exponential Number of Feature Models in Linear Time

AAAI Conferences

Naive Bayes (NB) is well-known to be a simple but effective classifier, especially when combined with feature selection. Unfortunately, feature selection methods are often greedy and thus cannot guarantee an optimal feature set is selected. An alternative to feature selection is to use Bayesian model averaging (BMA), which computes a weighted average over multiple predictors; when the different predictor models correspond to different feature sets, BMA has the advantage over feature selection that its predictions tend to have lower variance on average in comparison to any single model. In this paper, we show for the first time that it is possible to exactly evaluate BMA over the exponentially-sized powerset of NB feature models in linear-time in the number of features; this yields an algorithm about as expensive to train as a single NB model with all features, but yet provably converges to the globally optimal feature subset in the asymptotic limit of data. We evaluate this novel BMA-NB classifier on a range of datasets showing that it never underperforms NB (as expected) and sometimes offers performance competitive (or superior) to classifiers such as SVMs and logistic regression while taking a fraction of the time to train.


Obtaining Well Calibrated Probabilities Using Bayesian Binning

AAAI Conferences

However, model calibration and the learning is critical for many prediction and decision-making of well-calibrated probabilistic models have not been tasks in artificial intelligence. In this paper we present a new studied in the machine learning literature as extensively as nonparametric calibration method called Bayesian Binning for example discriminative machine learning models that into Quantiles (BBQ) which addresses key limitations of existing are built to achieve the best possible discrimination among calibration methods. The method post processes the classes of objects. One way to achieve a high level of model output of a binary classification algorithm; thus, it can be calibration is to develop methods for learning probabilistic readily combined with many existing classification algorithms.


On the Role of Canonicity in Knowledge Compilation

AAAI Conferences

Knowledge compilation is a powerful reasoning paradigm with many applications across AI and computer science more broadly. We consider the problem of bottom-up compilation of knowledge bases, which is usually predicated on the existence of a polytime function for combining compilations using Boolean operators (usually called an Apply function). While such a polytime Apply function is known to exist for certain languages (e.g., OBDDs) and not exist for others (e.g., DNNFs), its existence for certain languages remains unknown. Among the latter is the recently introduced language of Sentential Decision Diagrams (SDDs): while a polytime Apply function exists for SDDs, it was unknown whether such a function exists for the important subset of compressed SDDs which are canonical. We resolve this open question in this paper and consider some of its theoretical and practical implications. Some of the findings we report question the common wisdom on the relationship between bottom-up compilation, language canonicity and the complexity of the Apply function.