Directed Networks
A Kernel Approach to Tractable Bayesian Nonparametrics
Huszรกr, Ferenc, Lacoste-Julien, Simon
Inference in popular nonparametric Bayesian models typically relies on sampling or other approximations. This paper presents a general methodology for constructing novel tractable nonparametric Bayesian methods by applying the kernel trick to inference in a parametric Bayesian model. For example, Gaussian process regression can be derived this way from Bayesian linear regression. Despite the success of the Gaussian process framework, the kernel trick is rarely explicitly considered in the Bayesian literature. In this paper, we aim to fill this gap and demonstrate the potential of applying the kernel trick to tractable Bayesian parametric models in a wider context than just regression. As an example, we present an intuitive Bayesian kernel machine for density estimation that is obtained by applying the kernel trick to a Gaussian generative model in feature space.
Limits of Preprocessing
Many important computational problems that arise in various areas of AI are intractable. Nevertheless, AI research was very successful in developing and implementing heuristic solvers that work well on realworld instances. An important component of virtually every solver is a powerful polynomial-time preprocessing procedure that reduces the problem input. For instance, preprocessing techniques for the propositional satisfiability problem are based on Boolean Constraint Propagation (see, e.g., Eรฉn and Biere, 2005), CSP solvers make use of various local consistency algorithms that filter the domains of variables (see, e.g., Bessiรจre, 2006); similar preprocessing methods are used by solvers for Nonmonotonic and Bayesian reasoning problems (see, e.g., Gebser et al., 2008, Bolt and van der Gaag, 2006, respectively). Until recently, no provable performance guarantees for polynomial-time preprocessing methods have been obtained, and so preprocessing was only subject of empirical studies. A possible reason for the lack of theoretical results is a certain inadequacy of the P vs NP framework for such an analysis: if we could reduce in polynomial time an instance of an NPhard problem just by one bit, then we can solve the entire problem in polynomial time by repeating the reduction step a polynomial number of times, and P NP follows. With the advent of parameterized complexity (Downey, Fellows, and Stege, 1999), a new theoretical framework became available that provides suitable tools to analyze the power of preprocessing. Parameterized complexity considers a problem in a two-dimensional setting, where in addition to the input size n, a problem parameter k is taken into consideration.
Robust graphical modeling of gene networks using classical and alternative T-distributions
Finegold, Michael, Drton, Mathias
Graphical Gaussian models have proven to be useful tools for exploring network structures based on multivariate data. Applications to studies of gene expression have generated substantial interest in these models, and resulting recent progress includes the development of fitting methodology involving penalization of the likelihood function. In this paper we advocate the use of multivariate $t$-distributions for more robust inference of graphs. In particular, we demonstrate that penalized likelihood inference combined with an application of the EM algorithm provides a computationally efficient approach to model selection in the $t$-distribution case. We consider two versions of multivariate $t$-distributions, one of which requires the use of approximation techniques. For this distribution, we describe a Markov chain Monte Carlo EM algorithm based on a Gibbs sampler as well as a simple variational approximation that makes the resulting method feasible in large problems.
Distance Dependent Chinese Restaurant Processes
Blei, David M., Frazier, Peter I.
We develop the distance dependent Chinese restaurant process (CRP), a flexible class of distributions over partitions that allows for non-exchangeability. This class can be used to model many kinds of dependencies between data in infinite clustering models, including dependencies across time or space. We examine the properties of the distance dependent CRP, discuss its connections to Bayesian nonparametric mixture models, and derive a Gibbs sampler for both observed and mixture settings. We study its performance with three text corpora. We show that relaxing the assumption of exchangeability with distance dependent CRPs can provide a better fit to sequential data. We also show its alternative formulation of the traditional CRP leads to a faster-mixing Gibbs sampling algorithm than the one based on the original formulation.
Visualizing and Understanding Large-Scale Bayesian Networks
Cossalter, Michele (Carnegie Mellon University) | Mengshoel, Ole (Carnegie Mellon University) | Selker, Ted (Carnegie Mellon University)
Bayesian networks are a theoretically well-founded approach to represent large multi-variate probability distributions, and have proven useful in a broad range of applications. While several software tools for visualizing and editing Bayesian networks exist, they have important weaknesses when it comes to enabling users to clearly understand and compare conditional probability tables in the context of network topology, especially in large-scale networks. This paper describes a system for improving the ability for computers to work with people to develop intelligent systems through the construction of high-performing Bayesian networks. We describe NetEx, a tool developed as a Cytoscape plug-in, which allows a user to visually inspect and compare details concerning multiple nodes in a Bayesian network while maintaining awareness of their network context. It uses a "thought bubble line" to connect nodes in a graph representation and their internal information at the side of the graph. The tool seeks to improve the ability of experts to analyze and debug large Bayesian network models, and to help people to understand how alternative algorithms and Bayesian networks operate, providing insights into how to improve them.
Towards Detection of Suspicious Behavior from Multiple Observations
Kaluza, Bostjan (Jozef Stefan Institute) | Kaminka, Gal (Bar Ilan University) | Tambe, Milind (University of Southern California)
This paper addresses the problem of detecting suspicious behavior from a collection of individuals events, where no single event is enough to decide whether his/her behavior is suspicious, but the combination of multiple events enables reasoning. We establish a Bayesian framework for evaluating multiple events and show that the current approaches lack modeling behavior history included in the estimation whether a trace of events is generated by a suspicious agent. We propose a heuristic for evaluating events according to the behavior of the agent in the past. The proposed approach, tested on an airport domain, outperforms the current approaches.
Modeling Bounded Rationality of Agents During Interactions
Guo, Qing (University of Illinois at Chicago) | Gmytrasiewicz, Piotr (University of Illinois at Chicago)
Frequently, it is advantageous for an agent to model other agents in order to predict their behavior during an interaction. Modeling others as rational has a long tradition in AI and game theory, but modeling other agentsโ departures from rationality is difficult and controversial. This paper proposes that bounded rationality be modeled as errors the agent being modeled is making while deciding on its action. We are motivated by the work on quantal response equilibria in behavioral game theory which uses Nash equilibria as the solution concept. In contrast, we use decision-theoretic maximization of expected utility. Quantal response assumes that a decision maker is rational, i.e., is maximizing his expected utility, but only approximately so, with an error rate characterized by a single error parameter. Another agentโs error rate may be unknown and needs to be estimated during an interaction. We show that the error rate of the quantal response can be estimated using Bayesian update of a suitable conjugate prior, and that it has a finitely dimensional sufficient statistic under strong simplifying assumptions. However, if the simplifying assumptions are relaxed, the quantal response does not admit a finite sufficient statistic and a more complex update is needed. This confirms the difficulty of using simple models of bounded rationality in general settings.
A Microtext Corpus for Persuasion Detection in Dialog
Young, Joel (Naval Postgraduate School) | Martell, Craig (Naval Postgraduate School) | Anand, Pranav (University of California, Santa Cruz) | Ortiz, Pedro (United States Naval Academy) | Henry Tucker Gilbert, IV (Naval Postgraduate School)
Automatic detection of persuasion is essential for machine interaction on the social web. To facilitate automated persuasion detection, we present a novel microtext corpus derived from hostage negotiation transcripts as well as a detailed manual (codebook) for persuasion annotation. Our corpus, called the NPS Persuasion Corpus, consists of 37 transcripts from four sets of hostage negotiation transcriptions. Each utterance in the corpus is hand annotated for one of nine categories of persuasion based on Cialdiniโs model: reciprocity, commitment, consistency, liking, authority, social proof, scarcity, other, and not persuasive. Initial results using three supervised learning algorithms (Na ฬve Bayes, Maximum Entropy, and Support Vector Machines) combined with gappy and orthogonal sparse bigram feature expansion techniques show that the annotation process did capture machine learnable features of persuasion with F-scores better than baseline.
Human Activity Detection from RGBD Images
Sung, Jaeyong (Cornell University) | Ponce, Colin (Cornell University) | Selman, Bart (Cornell University) | Saxena, Ashutosh (Cornell University)
Being able to detect and recognize human activities is important for making personal assistant robots useful in performing assistive tasks. The challenge is to develop a system that is low-cost, reliable in unstructured home settings, and also straightforward to use. In this paper, we use a RGBD sensor (Microsoft Kinect) as the input sensor, and present learning algorithms to infer the activities. Our algorithm is based on a hierarchical maximum entropy Markov model (MEMM). It considers a person's activity as composed of a set of sub-activities, and infers the two-layered graph structure using a dynamic programming approach. We test our algorithm on detecting and recognizing twelve different activities performed by four people in different environments, such as a kitchen, a living room, an office, etc., and achieve an average performance of 84.3% when the person was seen before in the training set (and 64.2% when the person was not seen before).
Fixing a Hole in Lexicalized Plan Recognition
Geib, Christopher (University of Edinburgh)
Previous work has suggested the use of lexicalized grammars for probabilistic plan recognition. Such grammars allow the domain builder to delay commitment to hypothesizing high level goals in order to reduce computational costs. However this delay has limitations. In the case of only partial observation traces, delaying commitment can prevent such algorithms from forming correct conclusions about some goals. This paper presents a heuristic metric to address this limitation. It advocates computing the maximum change in conditional probability across all the computed explanations given the observations explicitly considering a goal of interest.