Genre
Accelerating EM: An Empirical Study
Ortiz, Luis E., Kaelbling, Leslie Pack
Many applications require that we learn the parameters of a model from data. EM is a method used to learn the parameters of probabilistic models for which the data for some of the variables in the models is either missing or hidden. There are instances in which this method is slow to converge. Therefore, several accelerations have been proposed to improve the method. None of the proposed acceleration methods are theoretically dominant and experimental comparisons are lacking. In this paper, we present the different proposed accelerations and try to compare them experimentally. From the results of the experiments, we argue that some acceleration of EM is always possible, but that which acceleration is superior depends on properties of the problem.
Probabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of two-mode and co-occurrence data, which has applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. Compared to standard Latent Semantic Analysis which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed method is based on a mixture decomposition derived from a latent class model. This results. in a more principled approach which has a solid foundation in statistics. In order to avoid overfitting, we propose a widely applicable generalization of maximum likelihood model fitting by tempered EM.
Inferring Parameters and Structure of Latent Variable Models by Variational Bayes
Current methods for learning graphical models with latent variables and a fixed structure estimate optimal values for the model parameters. Whereas this approach usually produces overfitting and suboptimal generalization performance, carrying out the Bayesian program of computing the full posterior distributions over the parameters remains a difficult problem. Moreover, learning the structure of models with latent variables, for which the Bayesian approach is crucial, is yet a harder problem. In this paper I present the Variational Bayes framework, which provides a solution to these problems. This approach approximates full posterior distributions over model parameters and structures, as well as latent variables, in an analytical manner without resorting to sampling methods. Unlike in the Laplace approximation, these posteriors are generally non-Gaussian and no Hessian needs to be computed. The resulting algorithm generalizes the standard Expectation Maximization algorithm, and its convergence is guaranteed. I demonstrate that this algorithm can be applied to a large class of models in several domains, including unsupervised clustering and blind source separation.
Multi-class Generalized Binary Search for Active Inverse Reinforcement Learning
Melo, Francisco, Lopes, Manuel
This paper addresses the problem of learning a task from demonstration. We adopt the framework of inverse reinforcement learning, where tasks are represented in the form of a reward function. Our contribution is a novel active learning algorithm that enables the learning agent to query the expert for more informative demonstrations, thus leading to more sample-efficient learning. For this novel algorithm (Generalized Binary Search for Inverse Reinforcement Learning, or GBS-IRL), we provide a theoretical bound on sample complexity and illustrate its applicability on several different tasks. To our knowledge, GBS-IRL is the first active IRL algorithm with provable sample complexity bounds. We also discuss our method in light of other existing methods in the literature and its general applicability in multi-class classification problems. Finally, motivated by recent work on learning from demonstration in robots, we also discuss how different forms of human feedback can be integrated in a transparent manner in our learning framework.
Contextual Weak Independence in Bayesian Networks
Wong, Michael S. K. M., Butz, C. J.
It is well-known that the notion of (strong) conditional independence (CI) is too restrictive to capture independencies that only hold in certain contexts. This kind of contextual independency, called context-strong independence (CSI), can be used to facilitate the acquisition, representation, and inference of probabilistic knowledge. In this paper, we suggest the use of contextual weak independence (CWI) in Bayesian networks. It should be emphasized that the notion of CWI is a more general form of contextual independence than CSI. Furthermore, if the contextual strong independence holds for all contexts, then the notion of CSI becomes strong CI. On the other hand, if the weak contextual independence holds for all contexts, then the notion of CWI becomes weak independence (WI) nwhich is a more general noncontextual independency than strong CI. More importantly, complete axiomatizations are studied for both the class of WI and the class of CI and WI together. Finally, the interesting property of WI being a necessary and sufficient condition for ensuring consistency in granular probabilistic networks is shown.
Probabilistic Belief Change: Expansion, Conditioning and Constraining
The AGM theory of belief revision has become an important paradigm for investigating rational belief changes. Unfortunately, researchers working in this paradigm have restricted much of their attention to rather simple representations of belief states, namely logically closed sets of propositional sentences. In our opinion, this has resulted in a too abstract categorisation of belief change operations: expansion, revision, or contraction. Occasionally, in the AGM paradigm, also probabilistic belief changes have been considered, and it is widely accepted that the probabilistic version of expansion is conditioning. However, we argue that it may be more correct to view conditioning and expansion as two essentially different kinds of belief change, and that what we call constraining is a better candidate for being considered probabilistic expansion.
How to Elicit Many Probabilities
van der Gaag, Linda C., Renooij, Silja, Witteman, Cilia L. M., Aleman, Berthe M. P., Taal, Babs G.
In building Bayesian belief networks, the elicitation of all probabilities required can be a major obstacle. We learned the extent of this often-cited observation in the construction of the probabilistic part of a complex influence diagram in the field of cancer treatment. Based upon our negative experiences with existing methods, we designed a new method for probability elicitation from domain experts. The method combines various ideas, among which are the ideas of transcribing probabilities and of using a scale with both numerical and verbal anchors for marking assessments. In the construction of the probabilistic part of our influence diagram, the method proved to allow for the elicitation of many probabilities in little time.
An Update Semantics for Defeasible Obligations
van der Torre, Leendert, Tan, Yao-Hua
The deontic logic DUS is a Deontic Update Semantics for prescriptive obligations based on the update semantics of Veltman. In DUS the definition of logical validity of obligations is not based on static truth values but on dynamic action transitions. In this paper prescriptive defeasible obligations are formalized in update semantics and the diagnostic problem of defeasible deontic logic is discussed. Assume a defeasible obligation `normally A ought to be (done)' together withthe fact `A is not (done).' Is this an exception of the normality claim, or is it a violation of the obligation? In this paper we formalize the heuristic principle that it is a violation, unless there is a more specific overriding obligation. The underlying motivation from legal reasoning is that criminals should have as little opportunities as possible to excuse themselves by claiming that their behavior was exceptional rather than criminal.
Multiplicative Factorization of Noisy-Max
Takikawa, Masami, D'Ambrosio, Bruce
The noisy-or and its generalization noisy-max have been utilized to reduce the complexity of knowledge acquisition. In this paper, we present a new representation of noisy-max that allows for efficient inference in general Bayesian networks. Empirical studies show that our method is capable of computing queries in well-known large medical networks, QMR-DT and CPCS, for which no previous exact inference method has been shown to perform well.
Practical Uses of Belief Functions
We present examples where the use of belief functions provided sound and elegant solutions to real life problems. These are essentially characterized by ?missing' information. The examples deal with 1) discriminant analysis using a learning set where classes are only partially known; 2) an information retrieval systems handling inter-documents relationships; 3) the combination of data from sensors competent on partially overlapping frames; 4) the determination of the number of sources in a multi-sensor environment by studying the inter-sensors contradiction. The purpose of the paper is to report on such applications where the use of belief functions provides a convenient tool to handle ?messy' data problems.