Goto

Collaborating Authors

 Learning Graphical Models


A Minimum Description Length Approach to Multitask Feature Selection

arXiv.org Artificial Intelligence

Many regression problems involve not one but several response variables (y's). Often the responses are suspected to share a common underlying structure, in which case it may be advantageous to share information across them; this is known as multitask learning. As a special case, we can use multiple responses to better identify shared predictive features -- a project we might call multitask feature selection. This thesis is organized as follows. Section 1 introduces feature selection for regression, focusing on ell_0 regularization methods and their interpretation within a Minimum Description Length (MDL) framework. Section 2 proposes a novel extension of MDL feature selection to the multitask setting. The approach, called the "Multiple Inclusion Criterion" (MIC), is designed to borrow information across regression tasks by more easily selecting features that are associated with multiple responses. We show in experiments on synthetic and real biological data sets that MIC can reduce prediction error in settings where features are at least partially shared across responses. Section 3 surveys hypothesis testing by regression with a single response, focusing on the parallel between the standard Bonferroni correction and an MDL approach. Mirroring the ideas in Section 2, Section 4 proposes a novel MIC approach to hypothesis testing with multiple responses and shows that on synthetic data with significant sharing of features across responses, MIC sometimes outperforms standard FDR-controlling methods in terms of finding true positives for a given level of false positives. Section 5 concludes.


Characterizing predictable classes of processes

arXiv.org Artificial Intelligence

The problem is sequence prediction in the following setting. A sequence $x_1,...,x_n,...$ of discrete-valued observations is generated according to some unknown probabilistic law (measure) $\mu$. After observing each outcome, it is required to give the conditional probabilities of the next observation. The measure $\mu$ belongs to an arbitrary class $\C$ of stochastic processes. We are interested in predictors $\rho$ whose conditional probabilities converge to the "true" $\mu$-conditional probabilities if any $\mu\in\C$ is chosen to generate the data. We show that if such a predictor exists, then a predictor can also be obtained as a convex combination of a countably many elements of $\C$. In other words, it can be obtained as a Bayesian predictor whose prior is concentrated on a countable set. This result is established for two very different measures of performance of prediction, one of which is very strong, namely, total variation, and the other is very weak, namely, prediction in expected average Kullback-Leibler divergence.



Multiagent Bayesian Forecasting of Time Series with Graphical Models

AAAI Conferences

Time series are found widely in engineering and science.  We study multiagent forecasting in time series, drawing from literature on time series, graphical models, and multiagent systems.  Knowledge representation of our agents is based on dynamic multiply sectioned Bayesian networks (DMSBNs), a class of cooperative multiagent graphical models.  We propose a method through which agents can perform one-step forecast with exact probabilistic inference.  Superior performance of our agents over agents based on dynamic Bayesian networks (DBNs) are demonstrated through experiment.


Dynamic Programming Approximations for Partially Observable Stochastic Games

AAAI Conferences

Partially observable stochastic games (POSGs) provide a rich mathematical framework for planning under uncertainty by a group of agents. However, this modeling advantage comes with a price, namely computation cost. Solving POSGs optimally quickly becomes intractable after a few decision cycles. Our main contribution is to provide bounded approximation techniques which enable us to scale POSG algorithms by several orders of magnitude. We study both the general POSGs and its cooperative counterpart DEC-POMDPs. Experiments on a number of problems confirm the scalability of our approach while still providing useful policies.


Constraint-based Approach to Discovery of Inter Module Dependencies in Modular Bayesian Networks

AAAI Conferences

This paper introduces an information theoretic approach to verification of modular causal probabilistic models. We assume systems which are gradually extended by adding new functional modules, each having a limited domain knowledge captured by a local Bayesian network. Different modules originate from independent design processes. We assume that the local models are correct, which, however does not guarantee globally coherent inference in composed systems. The introduced method supports discovery of significant inter module dependencies which are ignored in the assembled Bayesian network.


Join Tree Propagation Utilizing Both Arc Reversal and Variable Elimination

AAAI Conferences

In this paper, we put forth the first join tree propagation algorithm  that selectively applies either arc reversal (AR) or variable elimination (VE) to build the propagated messages. Our approach utilizes a recent method for identifying the propagated join tree messages \`{a} priori. When it is determined that precisely one message is to be constructed at a join tree node, VE is utilized to build this distribution; otherwise, AR is applied as it is better suited to construct multiple distributions passed between  neighboring join tree nodes. Experimental results, involving evidence processing in  seven real-world and one benchmark Bayesian network,  empirically demonstrate that selectively applying VE and AR is faster than applying one of these methods exclusively on the entire network.


Identifying User Destinations in VirtualWorlds

AAAI Conferences

This paper focuses on the identification of human activity patterns in SecondLife (SL), a user-constructed virtual environment.SecondLife allows the users to create a virtual avatar,explore areas constructed by other users, socialize, and conduct financial transactions just as one would in the real world.However unlike the real world, new attractions can be constructed within hours and previous ones often fall into disuse rapidly. Without current information about the state of regions in the virtual world, it is difficult to infer the purpose of the user’s actions from location information. In this paper,we present an approach for gathering data on users’ activities and building a map of SecondLife annotated with information about activities that the users were able to perform in each region. Using this map, a recommender agent built into the user’s heads-up display can present suggestions of other areas to visit based on data collected from previous users. We discuss the the use of five supervised classifiers and report classification results for the map construction portion of the agent.


Confidence-based Tuning of Nomogram Predictions

AAAI Conferences

Instance classification using machine learning techniques has numerous applications, from automation to medical diagnosis. In many problem domains, such as spam filtering, classification must be performed quickly across large datasets. In this paper we begin with machine learning techniques based on the naive Bayes classification and attempt to improve classification performance by taking into account attribute confidence intervals.  Our prediction functions operate over nominal datasets and retain the asymptotic complexity of one-pass learning and prediction functions. We present preliminary results indicating a modest, albeit inconsistent improvement over the naive Bayes classifier alone.


VipBoost: A More Accurate Boosting Algorithm

AAAI Conferences

Boosting is a well-known method for improving the accuracy of many learning algorithms. In this paper, we propose a novel boosting algorithm, VipBoost (voting on boosting classifications from imputed learning sets), which first generates multiple incomplete datasets from the original dataset by randomly removing a small percentage of observed attribute values, then uses an imputer to fill in the missing values.  It then applies AdaBoost (using some base learner) to produce classifiers trained on each of the imputed learning sets, to produce multiple classifiers. The subsequent prediction on a new test case is the most frequent classification from these classifiers. Our empirical results show that VipBoost produces very effective classifiers that significantly improve accuracy for unstable base learners and some stable learners, especially when the initial dataset is incomplete.