Goto

Collaborating Authors

 Bayesian Inference


An Algorithmic Theory of Dependent Regularizers, Part 1: Submodular Structure

arXiv.org Machine Learning

We present an exploration of the rich theoretical connections between several classes of regularized models, network flows, and recent results in submodular function theory. This work unifies key aspects of these problems under a common theory, leading to novel methods for working with several important models of interest in statistics, machine learning and computer vision. In Part 1, we review the concepts of network flows and submodular function optimization theory foundational to our results. We then examine the connections between network flows and the minimum-norm algorithm from submodular optimization, extending and improving several current results. This leads to a concise representation of the structure of a large class of pairwise regularized models important in machine learning, statistics and computer vision. In Part 2, we describe the full regularization path of a class of penalized regression problems with dependent variables that includes the graph-guided LASSO and total variation constrained models. This description also motivates a practical algorithm. This allows us to efficiently find the regularization path of the discretized version of TV penalized models. Ultimately, our new algorithms scale up to high-dimensional problems with millions of variables.


Object-oriented Bayesian networks for a decision support system for antitrust enforcement

arXiv.org Artificial Intelligence

We study an economic decision problem where the actors are two firms and the Antitrust Authority whose main task is to monitor and prevent firms' potential anti-competitive behaviour and its effect on the market. The Antitrust Authority's decision process is modelled using a Bayesian network where both the relational structure and the parameters of the model are estimated from a data set provided by the Authority itself. A number of economic variables that influence this decision process are also included in the model. We analyse how monitoring by the Antitrust Authority affects firms' strategies about cooperation. Firms' strategies are modelled as a repeated prisoner's dilemma using object-oriented Bayesian networks. We show how the integration of firms' decision process and external market information can be modelled in this way. Various decision scenarios and strategies are illustrated.


Multiscale Dictionary Learning for Estimating Conditional Distributions

arXiv.org Machine Learning

Massive datasets are becoming an ubiquitous byproduct of modern scientific and industrial applications. These data present statistical and computational challenges because many previously developed analysis approaches do not scaleup sufficiently. Challenges arise because of the ultra high-dimensionality and relatively low sample size. Parsimonious models for such big data assume that the density in the ambient space concentrates around a lower-dimensional (possibly nonlinear) subspace. A plethora of methods are emerging to estimate such lower-dimensional subspaces [25, 2]. 1 We are interested in using such lower-dimensional embeddings to obtain estimates of the conditional distribution of some target variable(s). This conditional density estimation setting arises in a number of important application areas, including neuroscience, genetics, and video processing. For example, one might desire automated estimation of a predictive density for a neurologic phenotype of interest, such as intelligence, on the basis of available data for a patient including neuroimaging.


Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search

Journal of Artificial Intelligence Research

Bayesian planning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way. Unfortunately, planning optimally in the face of uncertainty is notoriously taxing, since the search space is enormous. In this paper we introduce a tractable, sample-based method for approximate Bayes-optimal planning which exploits Monte-Carlo tree search. Our approach avoids expensive applications of Bayes rule within the search tree by sampling models from current beliefs, and furthermore performs this sampling in a lazy manner. This enables it to outperform previous Bayesian model-based reinforcement learning algorithms by a significant margin on several well-known benchmark problems. As we show, our approach can even work in problems with an infinite state space that lie qualitatively out of reach of almost all previous work in Bayesian exploration.


One-Class Classification: Taxonomy of Study and Review of Techniques

arXiv.org Artificial Intelligence

One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.


Viterbi training in PRISM

arXiv.org Artificial Intelligence

VT (Viterbi training), or hard EM, is an efficient way of parameter learning for probabilistic models with hidden variables. Given an observation $y$, it searches for a state of hidden variables $x$ that maximizes $p(x,y \mid \theta)$ by coordinate ascent on parameters $\theta$ and $x$. In this paper we introduce VT to PRISM, a logic-based probabilistic modeling system for generative models. VT improves PRISM in three ways. First VT in PRISM converges faster than EM in PRISM due to the VT's termination condition. Second, parameters learned by VT often show good prediction performance compared to those learned by EM. We conducted two parsing experiments with probabilistic grammars while learning parameters by a variety of inference methods, i.e.\ VT, EM, MAP and VB. The result is that VT achieved the best parsing accuracy among them in both experiments. Also we conducted a similar experiment for classification tasks where a hidden variable is not a prediction target unlike probabilistic grammars. We found that in such a case VT does not necessarily yield superior performance. Third since VT always deals with a single probability of a single explanation, Viterbi explanation, the exclusiveness condition that is imposed on PRISM programs is no more required if we learn parameters by VT. Last but not least we can say that as VT in PRISM is general and applicable to any PRISM program, it largely reduces the need for the user to develop a specific VT algorithm for a specific model. Furthermore since VT in PRISM can be used just by setting a PRISM flag appropriately, it makes VT easily accessible to (probabilistic) logic programmers. To appear in Theory and Practice of Logic Programming (TPLP).


Gaussian Probabilities and Expectation Propagation

arXiv.org Machine Learning

While Gaussian probability densities are omnipresent in applied mathematics, Gaussian cumulative probabilities are hard to calculate in any but the univariate case. We study the utility of Expectation Propagation (EP) as an approximate integration method for this problem. For rectangular integration regions, the approximation is highly accurate. We also extend the derivations to the more general case of polyhedral integration regions. However, we find that in this polyhedral case, EP's answer, though often accurate, can be almost arbitrarily wrong. We consider these unexpected results empirically and theoretically, both for the problem of Gaussian probabilities and for EP more generally. These results elucidate an interesting and non-obvious feature of EP not yet studied in detail.


On Approximate Inference for Generalized Gaussian Process Models

arXiv.org Machine Learning

A generalized Gaussian process model (GGPM) is a unifying framework that encompasses many existing Gaussian process (GP) models, such as GP regression, classification, and counting. In the GGPM framework, the observation likelihood of the GP model is itself parameterized using the exponential family distribution (EFD). In this paper, we consider efficient algorithms for approximate inference on GGPMs using the general form of the EFD. A particular GP model and its associated inference algorithms can then be formed by changing the parameters of the EFD, thus greatly simplifying its creation for task-specific output domains. We demonstrate the efficacy of this framework by creating several new GP models for regressing to non-negative reals and to real intervals. We also consider a closed-form Taylor approximation for efficient inference on GGPMs, and elaborate on its connections with other model-specific heuristic closed-form approximations. Finally, we present a comprehensive set of experiments to compare approximate inference algorithms on a wide variety of GGPMs.


Unsupervised Sub-tree Alignment for Tree-to-Tree Translation

Journal of Artificial Intelligence Research

This article presents a probabilistic sub-tree alignment model and its application to tree-to-tree machine translation. Unlike previous work, we do not resort to surface heuristics or expensive annotated data, but instead derive an unsupervised model to infer the syntactic correspondence between two languages. More importantly, the developed model is syntactically-motivated and does not rely on word alignments. As a by-product, our model outputs a sub-tree alignment matrix encoding a large number of diverse alignments between syntactic structures, from which machine translation systems can efficiently extract translation rules that are often filtered out due to the errors in 1-best alignment. Experimental results show that the proposed approach outperforms three state-of-the-art baseline approaches in both alignment accuracy and grammar quality. When applied to machine translation, our approach yields a +1.0 BLEU improvement and a -0.9 TER reduction on the NIST machine translation evaluation corpora. With tree binarization and fuzzy decoding, it even outperforms a state-of-the-art hierarchical phrase-based system.


Learning Pairwise Graphical Models with Nonlinear Sufficient Statistics

arXiv.org Machine Learning

We investigate a generic problem of learning pairwise exponential family graphical models with pairwise sufficient statistics defined by a global mapping function, e.g., Mercer kernels. This subclass of pairwise graphical models allow us to flexibly capture complex interactions among variables beyond pairwise product. We propose two $\ell_1$-norm penalized maximum likelihood estimators to learn the model parameters from i.i.d. samples. The first one is a joint estimator which estimates all the parameters simultaneously. The second one is a node-wise conditional estimator which estimates the parameters individually for each node. For both estimators, we show that under proper conditions the extra flexibility gained in our model comes at almost no cost of statistical and computational efficiency. We demonstrate the advantages of our model over state-of-the-art methods on synthetic and real datasets.