Goto

Collaborating Authors

 Learning Graphical Models


Gibbs posterior for variable selection in high-dimensional classification and data mining

arXiv.org Machine Learning

In the popular approach of "Bayesian variable selection" (BVS), one uses prior and posterior distributions to select a subset of candidate variables to enter the model. A completely new direction will be considered here to study BVS with a Gibbs posterior originating in statistical mechanics. The Gibbs posterior is constructed from a risk function of practical interest (such as the classification error) and aims at minimizing a risk function without modeling the data probabilistically. This can improve the performance over the usual Bayesian approach, which depends on a probability model which may be misspecified. Conditions will be provided to achieve good risk performance, even in the presence of high dimensionality, when the number of candidate variables "$K$" can be much larger than the sample size "$n$." In addition, we develop a convenient Markov chain Monte Carlo algorithm to implement BVS with the Gibbs posterior.


On the Possibility of Learning in Reactive Environments with Arbitrary Dependence

arXiv.org Artificial Intelligence

We address the problem of reinforcement learning in which observations may exhibit an arbitrary form of stochastic dependence on past observations and actions, i.e. environments more general than (PO)MDPs. The task for an agent is to attain the best possible asymptotic reward where the true generating environment is unknown but belongs to a known countable family of environments. We find some sufficient conditions on the class of environments under which an agent exists which attains the best asymptotic reward for any environment in the class. We analyze how tight these conditions are and how they relate to different probabilistic assumptions known in reinforcement learning and related fields, such as Markov Decision Processes and mixing conditions.


Choice of neighbor order in nearest-neighbor classification

arXiv.org Machine Learning

The $k$th-nearest neighbor rule is arguably the simplest and most intuitively appealing nonparametric classification procedure. However, application of this method is inhibited by lack of knowledge about its properties, in particular, about the manner in which it is influenced by the value of $k$; and by the absence of techniques for empirical choice of $k$. In the present paper we detail the way in which the value of $k$ determines the misclassification error. We consider two models, Poisson and Binomial, for the training samples. Under the first model, data are recorded in a Poisson stream and are "assigned" to one or other of the two populations in accordance with the prior probabilities. In particular, the total number of data in both training samples is a Poisson-distributed random variable. Under the Binomial model, however, the total number of data in the training samples is fixed, although again each data value is assigned in a random way. Although the values of risk and regret associated with the Poisson and Binomial models are different, they are asymptotically equivalent to first order, and also to the risks associated with kernel-based classifiers that are tailored to the case of two derivatives. These properties motivate new methods for choosing the value of $k$.


Statistical Learning Theory: Models, Concepts, and Results

arXiv.org Machine Learning

Statistical learning theory provides the theoretical basis for many of today's machine learning algorithms and is arguably one of the most beautifully developed branches of artificial intelligence in general. It originated in Russia in the 1960s and gained wide popularity in the 1990s following the development of the so-called Support Vector Machine (SVM), which has become a standard tool for pattern recognition in a variety of domains ranging from computer vision to computational biology. Providing the basis of new learning algorithms, however, was not the only motivation for developing statistical learning theory. It was just as much a philosophical one, attempting to answer the question of what it is that allows us to draw valid conclusions from empirical data. In this article we attempt to give a gentle, nontechnical overview over the key ideas and insights of statistical learning theory. We do not assume that the reader has a deep background in mathematics, statistics, or computer science. Given the nature of the subject matter, however, some familiarity with mathematical concepts and notations and some intuitive understanding of basic probability is required. There exist many excellent references to more technical surveys of the mathematics of statistical learning theory: the monographs by one of the founders of statistical learning theory (Vapnik, 1995, Vapnik, 1998), a brief overview over statistical learning theory in Section 5 of Schölkopf and Smola (2002), more technical overview papers such as Bousquet et al. (2003), Mendelson (2003), Boucheron et al. (2005), Herbrich and Williamson (2002), and the monograph Devroye et al. (1996).


On Similarities between Inference in Game Theory and Machine Learning

Journal of Artificial Intelligence Research

In this paper, we elucidate the equivalence between inference in game theory and machine learning. Our aim in so doing is to establish an equivalent vocabulary between the two domains so as to facilitate developments at the intersection of both fields, and as proof of the usefulness of this approach, we use recent developments in each field to make useful improvements to the other. More specifically, we consider the analogies between smooth best responses in fictitious play and Bayesian inference methods. Initially, we use these insights to develop and demonstrate an improved algorithm for learning in games based on probabilistic moderation. That is, by integrating over the distribution of opponent strategies (a Bayesian approach within machine learning) rather than taking a simple empirical average (the approach used in standard fictitious play) we derive a novel moderated fictitious play algorithm and show that it is more likely than standard fictitious play to converge to a payoff-dominant but risk-dominated Nash equilibrium in a simple coordination game. Furthermore we consider the converse case, and show how insights from game theory can be used to derive two improved mean field variational learning algorithms. We first show that the standard update rule of mean field variational learning is analogous to a Cournot adjustment within game theory. By analogy with fictitious play, we then suggest an improved update rule, and show that this results in fictitious variational play, an improved mean field variational learning algorithm that exhibits better convergence in highly or strongly connected graphical models. Second, we use a recent advance in fictitious play, namely dynamic fictitious play, to derive a derivative action variational learning algorithm, that exhibits superior convergence properties on a canonical machine learning problem (clustering a mixture distribution).


The many faces of optimism - Extended version

arXiv.org Artificial Intelligence

The exploration-exploitation dilemma has been an intriguing and unsolved problem within the framework of reinforcement learning. "Optimism in the face of uncertainty" and model building play central roles in advanced exploration methods. Here, we integrate several concepts and obtain a fast and simple algorithm. We show that the proposed algorithm finds a near-optimal policy in polynomial time, and give experimental evidence that it is robust and efficient compared to its ascendants.


A Rigorously Bayesian Beam Model and an Adaptive Full Scan Model for Range Finders in Dynamic Environments

Journal of Artificial Intelligence Research

This paper proposes and experimentally validates a Bayesian network model of a range finder adapted to dynamic environments. All modeling assumptions are rigorously explained, and all model parameters have a physical interpretation. This approach results in a transparent and intuitive model. With respect to the state of the art beam model this paper: (i) proposes a different functional form for the probability of range measurements caused by unmodeled objects, (ii) intuitively explains the discontinuity encountered in te state of the art beam model, and (iii) reduces the number of model parameters, while maintaining the same representational power for experimental data. The proposed beam model is called RBBM, short for Rigorously Bayesian Beam Model. A maximum likelihood and a variational Bayesian estimator (both based on expectation-maximization) are proposed to learn the model parameters. Furthermore, the RBBM is extended to a full scan model in two steps: first, to a full scan model for static environments and next, to a full scan model for general, dynamic environments. The full scan model accounts for the dependency between beams and adapts to the local sample density when using a particle filter. In contrast to Gaussian-based state of the art models, the proposed full scan model uses a sample-based approximation. This sample-based approximation enables handling dynamic environments and capturing multi-modality, which occurs even in simple static environments.


Determining the Unithood of Word Sequences using a Probabilistic Approach

arXiv.org Artificial Intelligence

Most research related to unithood were conducted as part of a larger effort for the determination of termhood. Consequently, novelties are rare in this small sub-field of term extraction. In addition, existing work were mostly empirically motivated and derived. We propose a new probabilistically-derived measure, independent of any influences of termhood, that provides dedicated measures to gather linguistic evidence from parsed text and statistical evidence from Google search engine for the measurement of unithood. Our comparative study using 1,825 test cases against an existing empirically-derived function revealed an improvement in terms of precision, recall and accuracy.


Networks of Influence Diagrams: A Formalism for Representing Agents' Beliefs and Decision-Making Processes

Journal of Artificial Intelligence Research

This paper presents Networks of Influence Diagrams (NID), a compact, natural and highly expressive language for reasoning about agents' beliefs and decision-making processes. NIDs are graphical structures in which agents' mental models are represented as nodes in a network; a mental model for an agent may itself use descriptions of the mental models of other agents. NIDs are demonstrated by examples, showing how they can be used to describe conflicting and cyclic belief structures, and certain forms of bounded rationality. In an opponent modeling domain, NIDs were able to outperform other computational agents whose strategies were not known in advance. NIDs are equivalent in representation to Bayesian games but they are more compact and structured than this formalism. In particular, the equilibrium definition for NIDs makes an explicit distinction between agents' optimal strategies, and how they actually behave in reality.


Anytime Induction of Low-cost, Low-error Classifiers: a Sampling-based Approach

Journal of Artificial Intelligence Research

Machine learning techniques are gaining prevalence in the production of a wide range of classifiers for complex real-world applications with nonuniform testing and misclassification costs. The increasing complexity of these applications poses a real challenge to resource management during learning and classification. In this work we introduce ACT (anytime cost-sensitive tree learner), a novel framework for operating in such complex environments. ACT is an anytime algorithm that allows learning time to be increased in return for lower classification costs. It builds a tree top-down and exploits additional time resources to obtain better estimations for the utility of the different candidate splits. Using sampling techniques, ACT approximates the cost of the subtree under each candidate split and favors the one with a minimal cost. As a stochastic algorithm, ACT is expected to be able to escape local minima, into which greedy methods may be trapped. Experiments with a variety of datasets were conducted to compare ACT to the state-of-the-art cost-sensitive tree learners. The results show that for the majority of domains ACT produces significantly less costly trees. ACT also exhibits good anytime behavior with diminishing returns.