Goto

Collaborating Authors

 Learning Graphical Models


Block-Wise MAP Inference for Determinantal Point Processes with Application to Change-Point Detection

arXiv.org Machine Learning

Existing MAP inference algorithms for determinantal point processes (DPPs) need to calculate determinants or conduct eigenvalue decomposition generally at the scale of the full kernel, which presents a great challenge for real-world applications. In this paper, we introduce a class of DPPs, called BwDPPs, that are characterized by an almost block diagonal kernel matrix and thus can allow efficient block-wise MAP inference. Furthermore, BwDPPs are successfully applied to address the difficulty of selecting change-points in the problem of change-point detection (CPD), which results in a new BwDPP-based CPD method, named BwDppCpd. In BwDppCpd, a preliminary set of change-point candidates is first created based on existing well-studied metrics. Then, these change-point candidates are treated as DPP items, and DPP-based subset selection is conducted to give the final estimate of the change-points that favours both quality and diversity. The effectiveness of BwDppCpd is demonstrated through extensive experiments on five real-world datasets.


Non-parametric Bayesian Models of Response Function in Dynamic Image Sequences

arXiv.org Machine Learning

Estimation of response functions is an important task in dynamic medical imaging. This task arises for example in dynamic renal scintigraphy, where impulse response or retention functions are estimated, or in functional magnetic resonance imaging where hemodynamic response functions are required. These functions can not be observed directly and their estimation is complicated because the recorded images are subject to superposition of underlying signals. Therefore, the response functions are estimated via blind source separation and deconvolution. Performance of this algorithm heavily depends on the used models of the response functions. Response functions in real image sequences are rather complicated and finding a suitable parametric form is problematic. In this paper, we study estimation of the response functions using non-parametric Bayesian priors. These priors were designed to favor desirable properties of the functions, such as sparsity or smoothness. These assumptions are used within hierarchical priors of the blind source separation and deconvolution algorithm. Comparison of the resulting algorithms with these priors is performed on synthetic dataset as well as on real datasets from dynamic renal scintigraphy. It is shown that flexible non-parametric priors improve estimation of response functions in both cases. MATLAB implementation of the resulting algorithms is freely available for download.


The Knowledge Gradient Policy Using A Sparse Additive Belief Model

arXiv.org Machine Learning

We propose a sequential learning policy for noisy discrete global optimization and ranking and selection (R\&S) problems with high dimensional sparse belief functions, where there are hundreds or even thousands of features, but only a small portion of these features contain explanatory power. We aim to identify the sparsity pattern and select the best alternative before the finite budget is exhausted. We derive a knowledge gradient policy for sparse linear models (KGSpLin) with group Lasso penalty. This policy is a unique and novel hybrid of Bayesian R\&S with frequentist learning. Particularly, our method naturally combines B-spline basis expansion and generalizes to the nonparametric additive model (KGSpAM) and functional ANOVA model. Theoretically, we provide the estimation error bounds of the posterior mean estimate and the functional estimate. Controlled experiments show that the algorithm efficiently learns the correct set of nonzero parameters even when the model is imbedded with hundreds of dummy parameters. Also it outperforms the knowledge gradient for a linear model.


Interpretable Aircraft Engine Diagnostic via Expert Indicator Aggregation

arXiv.org Machine Learning

Detecting early signs of failures (anomalies) in complex systems is one of the main goal of preventive maintenance. It allows in particular to avoid actual failures by (re)scheduling maintenance operations in a way that optimizes maintenance costs. Aircraft engine health monitoring is one representative example of a field in which anomaly detection is crucial. Manufacturers collect large amount of engine related data during flights which are used, among other applications, to detect anomalies. This article introduces and studies a generic methodology that allows one to build automatic early signs of anomaly detection in a way that builds upon human expertise and that remains understandable by human operators who make the final maintenance decision. The main idea of the method is to generate a very large number of binary indicators based on parametric anomaly scores designed by experts, complemented by simple aggregations of those scores. A feature selection method is used to keep only the most discriminant indicators which are used as inputs of a Naive Bayes classifier. This give an interpretable classifier based on interpretable anomaly detectors whose parameters have been optimized indirectly by the selection process. The proposed methodology is evaluated on simulated data designed to reproduce some of the anomaly types observed in real world engines.


Shared latent subspace modelling within Gaussian-Binary Restricted Boltzmann Machines for NIST i-Vector Challenge 2014

arXiv.org Machine Learning

This paper presents a novel approach to speaker subspace modelling based on Gaussian-Binary Restricted Boltzmann Machines (GRBM). The proposed model is based on the idea of shared factors as in the Probabilistic Linear Discriminant Analysis (PLDA). GRBM hidden layer is divided into speaker and channel factors, herein the speaker factor is shared over all vectors of the speaker. Then Maximum Likelihood Parameter Estimation (MLE) for proposed model is introduced. Various new scoring techniques for speaker verification using GRBM are proposed. The results for NIST i-vector Challenge 2014 dataset are presented.


L0 Sparse Inverse Covariance Estimation

arXiv.org Machine Learning

Recently, there has been focus on penalized log-likelihood covariance estimation for sparse inverse covariance (precision) matrices. The penalty is responsible for inducing sparsity, and a very common choice is the convex $l_1$ norm. However, the best estimator performance is not always achieved with this penalty. The most natural sparsity promoting "norm" is the non-convex $l_0$ penalty but its lack of convexity has deterred its use in sparse maximum likelihood estimation. In this paper we consider non-convex $l_0$ penalized log-likelihood inverse covariance estimation and present a novel cyclic descent algorithm for its optimization. Convergence to a local minimizer is proved, which is highly non-trivial, and we demonstrate via simulations the reduced bias and superior quality of the $l_0$ penalty as compared to the $l_1$ penalty.


Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs

arXiv.org Machine Learning

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in rewards in addition to maximizing a standard criterion. Variance related risk measures are among the most common risk-sensitive criteria in finance and operations research. However, optimizing many such criteria is known to be a hard problem. In this paper, we consider both discounted and average reward Markov decision processes. For each formulation, we first define a measure of variability for a policy, which in turn gives us a set of risk-sensitive criteria to optimize. For each of these criteria, we derive a formula for computing its gradient. We then devise actor-critic algorithms that operate on three timescales - a TD critic on the fastest timescale, a policy gradient (actor) on the intermediate timescale, and a dual ascent for Lagrange multipliers on the slowest timescale. In the discounted setting, we point out the difficulty in estimating the gradient of the variance of the return and incorporate simultaneous perturbation approaches to alleviate this. The average setting, on the other hand, allows for an actor update using compatible features to estimate the gradient of the variance. We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in a traffic signal control application.


CORPP: Commonsense Reasoning and Probabilistic Planning, as Applied to Dialog with a Mobile Robot

AAAI Conferences

In order to be fully robust and responsive to a dynamically changing real-world environment, intelligent robots will need to engage in a variety of simultaneous reasoning modalities. In particular, in this paper we consider their needs to i) reason with commonsense knowledge, ii) model their nondeterministic action outcomes and partial observability, and iii) plan toward maximizing long-term rewards. On one hand, Answer Set Programming (ASP) is good at representing and reasoning with commonsense and default knowledge, but is ill-equipped to plan under probabilistic uncertainty. On the other hand, Partially Observable Markov Decision Processes (POMDPs) are strong at planning under uncertainty toward maximizing long-term rewards, but are not designed to incorporate commonsense knowledge and inference. This paper introduces the CORPP algorithm which combines P-log, a probabilistic extension of ASP, with POMDPs to integrate commonsense reasoning with planning under uncertainty. Our approach is fully implemented and tested on a shopping request identification problem both in simulation and on a real robot. Compared with existing approaches using P-log or POMDPs individually, we observe significant improvements in both efficiency and accuracy.


Towards Extracting Faithful and Descriptive Representations of Latent Variable Models

AAAI Conferences

Methods that use latent representations of data, such as matrix and tensor factorization or deep neural methods, are becoming increasingly popular for applications such as knowledge base population and recommendation systems. These approaches have been shown to be very robust and scalable but, in contrast to more symbolic approaches, lack interpretability. This makes debugging such models difficult, and might result in users not trusting the predictions of such systems. To overcome this issue we propose to extract an interpretable proxy model from a predictive latent variable model. We use a so-called pedagogical method, where we query our predictive model to obtain observations needed for learning a descriptive model. We describe two families of (presumably more) descriptive models, simple logic rules and Bayesian networks, and show how members of these families provide descriptive representations of matrix factorization models. Preliminary experiments on knowledge extraction from text indicate that even though Bayesian networks may be more faithful to a matrix factorization model than the logic rules, the latter are possibly more useful for interpretation and debugging.


A Probabilistic Extension of the Stable Model Semantics

AAAI Conferences

We present a probabilistic extension of logic programs under the stable model semantics, inspired by the idea of Markov Logic Networks. The proposed language, called LP MLN , is a generalization of logic programs under the stable model semantics, and as such, embraces the rich body of research in knowledge representation. The language is also a generalization of ProbLog, and is closely related to Markov Logic Networks, which implies that the computation can be carried out by the techniques developed for them.  LP MLN appears to be a natural language for probabilistic answer set programming, and as an example we show how an elaboration tolerant representation of transition systems in answer set programs can be naturally extended to the probabilistic setting.