Learning Graphical Models
Decentralized, Adaptive, Look-Ahead Particle Filtering
Ahmed, Mohamed Osama, Bibalan, Pouyan T., de Freitas, Nando, Fauvel, Simon
The decentralized particle filter (DPF) was proposed recently to increase the level of parallelism of particle filtering. Given a decomposition of the state space into two nested sets of variables, the DPF uses a particle filter to sample the first set and then conditions on this sample to generate a set of samples for the second set of variables. The DPF can be understood as a variant of the popular Rao-Blackwellized particle filter (RBPF), where the second step is carried out using Monte Carlo approximations instead of analytical inference. As a result, the range of applications of the DPF is broader than the one for the RBPF. In this paper, we improve the DPF in two ways. First, we derive a Monte Carlo approximation of the optimal proposal distribution and, consequently, design and implement a more efficient look-ahead DPF. Although the decentralized filters were initially designed to capitalize on parallel implementation, we show that the look-ahead DPF can outperform the standard particle filter even on a single machine. Second, we propose the use of bandit algorithms to automatically configure the state space decomposition of the DPF.
Scaling Inference for Markov Logic with a Task-Decomposition Approach
Niu, Feng, Zhang, Ce, Ré, Christopher, Shavlik, Jude
Motivated by applications in large-scale knowledge base construction, we study the problem of scaling up a sophisticated statistical inference framework called Markov Logic Networks (MLNs). Our approach, Felix, uses the idea of Lagrangian relaxation from mathematical programming to decompose a program into smaller tasks while preserving the joint-inference property of the original MLN. The advantage is that we can use highly scalable specialized algorithms for common tasks such as classification and coreference. We propose an architecture to support Lagrangian relaxation in an RDBMS which we show enables scalable joint inference for MLNs. We empirically validate that Felix is significantly more scalable and efficient than prior approaches to MLN inference by constructing a knowledge base from 1.8M documents as part of the TAC challenge. We show that Felix scales and achieves state-of-the-art quality numbers. In contrast, prior approaches do not scale even to a subset of the corpus that is three orders of magnitude smaller.
Sparsity-Promoting Bayesian Dynamic Linear Models
Caron, François, Bornn, Luke, Doucet, Arnaud
Sparsity-promoting priors have become increasingly popular over recent years due to an increased number of regression and classification applications involving a large number of predictors. In time series applications where observations are collected over time, it is often unrealistic to assume that the underlying sparsity pattern is fixed. We propose here an original class of flexible Bayesian linear models for dynamic sparsity modelling. The proposed class of models expands upon the existing Bayesian literature on sparse regression using generalized multivariate hyperbolic distributions. The properties of the models are explored through both analytic results and simulation studies. We demonstrate the model on a financial application where it is shown that it accurately represents the patterns seen in the analysis of stock and derivative data, and is able to detect major events by filtering an artificial portfolio of assets.
Inference in Hidden Markov Models with Explicit State Duration Distributions
Dewar, Michael, Wiggins, Chris, Wood, Frank
Hidden Markov models (HMMs) are a fundamental tool for data analysis and exploration. Many variants of the basic HMM have been developed in response to shortcomings in the original HMM formulation [9]. In this paper we address inference in the explicit state duration HMM (EDHMM). By state duration we mean the amount of time an HMM dwells in a state. In the standard HMM specification, a state's duration is implicit and, a priori, distributed geometrically. The EDHMM (or, equivalently, the hidden semi-Markov model [12]) was developed to allow explicit parameterization and direct inference of state duration distributions. EDHMM estimation and inference can be performed using the forward-backward algorithm; though only if the sequence is short or a tight "allowable" duration interval for each state is hard-coded a priori [13]. If the sequence is short then forward-backward can be run on a state representation that allows for all possible durations up to the observed sequence length. If the sequence is long then forward-backward only remains computationally tractable if only transitions between durations that lie within pre-specified allowable intervals are considered.
One Decade of Universal Artificial Intelligence
The first decade of this century has seen the nascency of the first mathematical theory of general artificial intelligence. This theory of Universal Artificial Intelligence (UAI) has made significant contributions to many theoretical, philosophical, and practical AI questions. In a series of papers culminating in book (Hutter, 2005), an exciting sound and complete mathematical model for a super intelligent agent (AIXI) has been developed and rigorously analyzed. While nowadays most AI researchers avoid discussing intelligence, the award-winning PhD thesis (Legg, 2008) provided the philosophical embedding and investigated the UAI-based universal measure of rational intelligence, which is formal, objective and non-anthropocentric. Recently, effective approximations of AIXI have been derived and experimentally investigated in JAIR paper (Veness et al. 2011). This practical breakthrough has resulted in some impressive applications, finally muting earlier critique that UAI is only a theory. For the first time, without providing any domain knowledge, the same agent is able to self-adapt to a diverse range of interactive environments. For instance, AIXI is able to learn from scratch to play TicTacToe, Pacman, Kuhn Poker, and other games by trial and error, without even providing the rules of the games. These achievements give new hope that the grand goal of Artificial General Intelligence is not elusive. This article provides an informal overview of UAI in context. It attempts to gently introduce a very theoretical, formal, and mathematical subject, and discusses philosophical and technical ingredients, traits of intelligence, some social questions, and the past and future of UAI.
Exploiting Model Equivalences for Solving Interactive Dynamic Influence Diagrams
We focus on the problem of sequential decision making in partially observable environments shared with other agents of uncertain types having similar or conflicting objectives. This problem has been previously formalized by multiple frameworks one of which is the interactive dynamic influence diagram (I-DID), which generalizes the well-known influence diagram to the multiagent setting. I-DIDs are graphical models and may be used to compute the policy of an agent given its belief over the physical state and others' models, which changes as the agent acts and observes in the multiagent setting. As we may expect, solving I-DIDs is computationally hard. This is predominantly due to the large space of candidate models ascribed to the other agents and its exponential growth over time. We present two methods for reducing the size of the model space and stemming its exponential growth. Both these methods involve aggregating individual models into equivalence classes. Our first method groups together behaviorally equivalent models and selects only those models for updating which will result in predictive behaviors that are distinct from others in the updated model space. The second method further compacts the model space by focusing on portions of the behavioral predictions. Specifically, we cluster actionally equivalent models that prescribe identical actions at a single time step. Exactly identifying the equivalences would require us to solve all models in the initial set. We avoid this by selectively solving some of the models, thereby introducing an approximation. We discuss the error introduced by the approximation, and empirically demonstrate the improved efficiency in solving I-DIDs due to the equivalences.
Minimax-Optimal Bounds for Detectors Based on Estimated Prior Probabilities
Jiao, Jiantao, Zhang, Lin, Nowak, Robert
In many signal detection and classification problems, we have knowledge of the distribution under each hypothesis, but not the prior probabilities. This paper is aimed at providing theory to quantify the performance of detection via estimating prior probabilities from either labeled or unlabeled training data. The error or {\em risk} is considered as a function of the prior probabilities. We show that the risk function is locally Lipschitz in the vicinity of the true prior probabilities, and the error of detectors based on estimated prior probabilities depends on the behavior of the risk function in this locality. In general, we show that the error of detectors based on the Maximum Likelihood Estimate (MLE) of the prior probabilities converges to the Bayes error at a rate of $n^{-1/2}$, where $n$ is the number of training data. If the behavior of the risk function is more favorable, then detectors based on the MLE have errors converging to the corresponding Bayes errors at optimal rates of the form $n^{-(1+\alpha)/2}$, where $\alpha>0$ is a parameter governing the behavior of the risk function with a typical value $\alpha = 1$. The limit $\alpha \rightarrow \infty$ corresponds to a situation where the risk function is flat near the true probabilities, and thus insensitive to small errors in the MLE; in this case the error of the detector based on the MLE converges to the Bayes error exponentially fast with $n$. We show the bounds are achievable no matter given labeled or unlabeled training data and are minimax-optimal in labeled case.
Learning the Nature of Information in Social Networks
Agrawal, Rakesh (Microsoft) | Potamias, Michalis (Groupon) | Terzi, Evimaria (Boston University)
We postulate that the nature of information items plays a vital role in the observed spread of these items in a social network. We capture this intuition by proposing a model that assigns to every information item two parameters: endogeneity and exogeneity. The endogeneity of the item quantifies its tendency to spread primarily through the connections between nodes; the exogeneity quantifies its tendency to be acquired by the nodes, independently of the underlying network. We also extend this item-based model to take into account the openness of each node to new information. We quantify openness by introducing the receptivity of a node. Given a social network and data related to the ordering of adoption of information items by nodes, we develop a maximum-likelihood framework for estimating endogeneity, exogeneity and receptivity parameters. We apply our methodology to synthetic and real data and demonstrate its efficacy as a data-analytic tool.
Where Is This Tweet From? Inferring Home Locations of Twitter Users
Mahmud, Jalal (IBM Research - Almaden) | Nichols, Jeffrey (IBM Research - Almaden) | Drews, Clemens (IBM Research - Almaden)
We present a new algorithm for inferring the home locations of Twitter users at different granularities, such as city, state, or time zone, using the content of their tweets and their tweeting behavior. Unlike existing approaches, our algorithm uses an ensemble of statistical and heuristic classifiers to predict locations. We find that a hierarchical classification approach can improve prediction accuracy. Experimental evidence suggests that our algorithm works well in practice and outperforms the best existing algorithms for predicting the location of Twitter users.
Identifying Microblogs for Targeted Contextual Advertising
Dave, Kushal Shailesh (International Institute of Information Technology, Hyderabad) | Varma, Vasudeva (International Institute of Information Technology, Hyderabad)
Micro-blogging sites such as Facebook, Twitter, Google+ present a nice opportunity for targeting advertisements that are contextually related to the microblog content. By virtue of the sparse and noisy text makes identifying the microblogs suitable for advertising a very hard problem. In this work, we approach the problem of identifying the microblogs that could be targeted for advertisements as a two-step classification approach. In the first pass, microblogs suitable for advertising are identified. Next, in the second pass, we build a model to find the sentiment of the advertisable microblog. The systems use features derived from the Part-of-speech tags, the tweet content and uses external resources such as query logs and n-gram dictionaries from previously labeled data.This work aims at providing a thorough insight into the problem and analyzing various features to assess which features contribute the most towards identifying the tweets that can be targeted for advertisements.