Goto

Collaborating Authors

 Learning Graphical Models


Finding the Graph of Epidemic Cascades

arXiv.org Machine Learning

We consider the problem of finding the graph on which an epidemic cascade spreads, given only the times when each node gets infected. While this is a problem of importance in several contexts -- offline and online social networks, e-commerce, epidemiology, vulnerabilities in infrastructure networks -- there has been very little work, analytical or empirical, on finding the graph. Clearly, it is impossible to do so from just one cascade; our interest is in learning the graph from a small number of cascades. For the classic and popular "independent cascade" SIR epidemics, we analytically establish the number of cascades required by both the global maximum-likelihood (ML) estimator, and a natural greedy algorithm. Both results are based on a key observation: the global graph learning problem decouples into $n$ local problems -- one for each node. For a node of degree $d$, we show that its neighborhood can be reliably found once it has been infected $O(d^2 \log n)$ times (for ML on general graphs) or $O(d\log n)$ times (for greedy on trees). We also provide a corresponding information-theoretic lower bound of $\Omega(d\log n)$; thus our bounds are essentially tight. Furthermore, if we are given side-information in the form of a super-graph of the actual graph (as is often the case), then the number of cascade samples required -- in all cases -- becomes independent of the network size $n$. Finally, we show that for a very general SIR epidemic cascade model, the Markov graph of infection times is obtained via the moralization of the network graph.


Improving feature selection algorithms using normalised feature histograms

arXiv.org Artificial Intelligence

The proposed feature selection method builds a histogram of the most stable features from random subsets of a training set and ranks the features based on a classifier based cross-validation. This approach reduces the instability of features obtained by conventional feature selection methods that occur with variation in training data and selection criteria. Classification results on four microarray and three image datasets using three major feature selection criteria and a naive Bayes classifier show considerable improvement over benchmark results.


Location-Based Reasoning about Complex Multi-Agent Behavior

Journal of Artificial Intelligence Research

Recent research has shown that surprisingly rich models of human activity can be learned from GPS (positional) data. However, most effort to date has concentrated on modeling single individuals or statistical properties of groups of people. Moreover, prior work focused solely on modeling actual successful executions (and not failed or attempted executions) of the activities of interest. We, in contrast, take on the task of understanding human interactions, attempted interactions, and intentions from noisy sensor data in a fully relational multi-agent setting. We use a real-world game of capture the flag to illustrate our approach in a well-defined domain that involves many distinct cooperative and competitive joint activities. We model the domain using Markov logic, a statistical-relational language, and learn a theory that jointly denoises the data and infers occurrences of high-level activities, such as a player capturing an enemy. Our unified model combines constraints imposed by the geometry of the game area, the motion model of the players, and by the rules and dynamics of the game in a probabilistically and logically sound fashion. We show that while it may be impossible to directly detect a multi-agent activity due to sensor noise or malfunction, the occurrence of the activity can still be inferred by considering both its impact on the future behaviors of the people involved as well as the events that could have preceded it. Further, we show that given a model of successfully performed multi-agent activities, along with a set of examples of failed attempts at the same activities, our system automatically learns an augmented model that is capable of recognizing success and failure, as well as goals of people's actions with high accuracy. We compare our approach with other alternatives and show that our unified model, which takes into account not only relationships among individual players, but also relationships among activities over the entire length of a game, although more computationally costly, is significantly more accurate. Finally, we demonstrate that explicitly modeling unsuccessful attempts boosts performance on other important recognition tasks.


Feature Selection for Value Function Approximation Using Bayesian Model Selection

arXiv.org Artificial Intelligence

Feature selection in reinforcement learning (RL), i.e. choosing basis functions such that useful approximations of the unkown value function can be obtained, is one of the main challenges in scaling RL to real-world applications. Here we consider the Gaussian process based framework GPTD for approximate policy evaluation, and propose feature selection through marginal likelihood optimization of the associated hyperparameters. Our approach has two appealing benefits: (1) given just sample transitions, we can solve the policy evaluation problem fully automatically (without looking at the learning task, and, in theory, independent of the dimensionality of the state space), and (2) model selection allows us to consider more sophisticated kernels, which in turn enable us to identify relevant subspaces and eliminate irrelevant state variables such that we can achieve substantial computational savings and improved prediction performance.


Ensemble Risk Modeling Method for Robust Learning on Scarce Data

arXiv.org Machine Learning

In medical risk modeling, typical data are "scarce": they have relatively small number of training instances (N), censoring, and high dimensionality (M). We show that the problem may be effectively simplified by reducing it to bipartite ranking, and introduce new bipartite ranking algorithm, Smooth Rank, for robust learning on scarce data. The algorithm is based on ensemble learning with unsupervised aggregation of predictors. The advantage of our approach is confirmed in comparison with two "gold standard" risk modeling methods on 10 real life survival analysis datasets, where the new approach has the best results on all but two datasets with the largest ratio N/M. For systematic study of the effects of data scarcity on modeling by all three methods, we conducted two types of computational experiments: on real life data with randomly drawn training sets of different sizes, and on artificial data with increasing number of features. Both experiments demonstrated that Smooth Rank has critical advantage over the popular methods on the scarce data; it does not suffer from overfitting where other methods do.


Threshold Choice Methods: the Missing Link

arXiv.org Artificial Intelligence

Many performance metrics have been introduced for the evaluation of classification performance, with different origins and niches of application: accuracy, macro-accuracy, area under the ROC curve, the ROC convex hull, the absolute error, and the Brier score (with its decomposition into refinement and calibration). One way of understanding the relation among some of these metrics is the use of variable operating conditions (either in the form of misclassification costs or class proportions). Thus, a metric may correspond to some expected loss over a range of operating conditions. One dimension for the analysis has been precisely the distribution we take for this range of operating conditions, leading to some important connections in the area of proper scoring rules. However, we show that there is another dimension which has not received attention in the analysis of performance metrics. This new dimension is given by the decision rule, which is typically implemented as a threshold choice method when using scoring models. In this paper, we explore many old and new threshold choice methods: fixed, score-uniform, score-driven, rate-driven and optimal, among others. By calculating the loss of these methods for a uniform range of operating conditions we get the 0-1 loss, the absolute error, the Brier score (mean squared error), the AUC and the refinement loss respectively. This provides a comprehensive view of performance metrics as well as a systematic approach to loss minimisation, namely: take a model, apply several threshold choice methods consistent with the information which is (and will be) available about the operating condition, and compare their expected losses. In order to assist in this procedure we also derive several connections between the aforementioned performance metrics, and we highlight the role of calibration in choosing the threshold choice method.


Combinatorial Modelling and Learning with Prediction Markets

arXiv.org Artificial Intelligence

Combining models in appropriate ways to achieve high performance is commonly seen in machine learning fields today. Although a large amount of combinatorial models have been created, little attention is drawn to the commons in different models and their connections. A general modelling technique is thus worth studying to understand model combination deeply and shed light on creating new models. Prediction markets show a promise of becoming such a generic, flexible combinatorial model. By reviewing on several popular combinatorial models and prediction market models, this paper aims to show how the market models can generalise different combinatorial stuctures and how they implement these popular combinatorial models in specific conditions. Besides, we will see among different market models, Storkey's \emph{Machine Learning Markets} provide more fundamental, generic modelling mechanisms than the others, and it has a significant appeal for both theoretical study and application.


On the equivalence of Hopfield Networks and Boltzmann Machines

arXiv.org Artificial Intelligence

A specific type of neural network, the Restricted Boltzmann Machine (RBM), is implemented for classification and feature detection in machine learning. RBM is characterized by separate layers of visible and hidden units, which are able to learn efficiently a generative model of the observed data. We study a "hybrid" version of RBM's, in which hidden units are analog and visible units are binary, and we show that thermodynamics of visible units are equivalent to those of a Hopfield network, in which the N visible units are the neurons and the P hidden units are the learned patterns. We apply the method of stochastic stability to derive the thermodynamics of the model, by considering a formal extension of this technique to the case of multiple sets of stored patterns, which may act as a benchmark for the study of correlated sets. Our results imply that simulating the dynamics of a Hopfield network, requiring the update of N neurons and the storage of N(N-1)/2 synapses, can be accomplished by a hybrid Boltzmann Machine, requiring the update of N+P neurons but the storage of only NP synapses. In addition, the well known glass transition of the Hopfield network has a counterpart in the Boltzmann Machine: It corresponds to an optimum criterion for selecting the relative sizes of the hidden and visible layers, resolving the trade-off between flexibility and generality of the model. The low storage phase of the Hopfield model corresponds to few hidden units and hence a overly constrained RBM, while the spin-glass phase (too many hidden units) corresponds to unconstrained RBM prone to overfitting of the observed data.


Distance-Based Bias in Model-Directed Optimization of Additively Decomposable Problems

arXiv.org Artificial Intelligence

For many optimization problems it is possible to define a distance metric between problem variables that correlates with the likelihood and strength of interactions between the variables. For example, one may define a metric so that the dependencies between variables that are closer to each other with respect to the metric are expected to be stronger than the dependencies between variables that are further apart. The purpose of this paper is to describe a method that combines such a problem-specific distance metric with information mined from probabilistic models obtained in previous runs of estimation of distribution algorithms with the goal of solving future problem instances of similar type with increased speed, accuracy and reliability. While the focus of the paper is on additively decomposable problems and the hierarchical Bayesian optimization algorithm, it should be straightforward to generalize the approach to other model-directed optimization techniques and other problem classes. Compared to other techniques for learning from experience put forward in the past, the proposed technique is both more practical and more broadly applicable.


A Split-Merge MCMC Algorithm for the Hierarchical Dirichlet Process

arXiv.org Artificial Intelligence

The hierarchical Dirichlet process (HDP) has become an important Bayesian nonparametric model for grouped data, such as document collections. The HDP is used to construct a flexible mixed-membership model where the number of components is determined by the data. As for most Bayesian nonparametric models, exact posterior inference is intractable---practitioners use Markov chain Monte Carlo (MCMC) or variational inference. Inspired by the split-merge MCMC algorithm for the Dirichlet process (DP) mixture model, we describe a novel split-merge MCMC sampling algorithm for posterior inference in the HDP. We study its properties on both synthetic data and text corpora. We find that split-merge MCMC for the HDP can provide significant improvements over traditional Gibbs sampling, and we give some understanding of the data properties that give rise to larger improvements.