Goto

Collaborating Authors

 Directed Networks


Generalized Weighted Model Counting: An Efficient Monte-Carlo Meta-Algorithm

AAAI Conferences

In this paper, we focus on computing the prices of secu- rities represented by logical formulas in combinatorial prediction markets when the price function is represented by a Bayesian network. This problem turns out to be a natural extension of the weighted model counting (WMC) problem (Sang, Bearne, and Kautz 2005), which we call generalized weighted model counting (GWMC) problem. In GWMC, we are given a logical formula F and a polynomial-time computable weight function. We are asked to compute the total weight of the valuations that satisfy F. Based on importance sampling, we propose a Monte-Carlo meta-algorithm that has a good theoretical guarantee for formulas in disjunctive normal form (DNF). The meta-algorithm queries an oracle algorithm that computes marginal probabilities in Bayesian networks, and has the following theoretical guarantee. When the weight function can be approximately represented by a Bayesian network for which the oracle algorithm runs in polynomial time, our meta-algorithm becomes a fully polynomial-time randomized approximation scheme (FPRAS).


An Information-Theoretic Metric for Collective Human Judgment

AAAI Conferences

We consider the problem of evaluating the performance of human contributors for tasks involving answering a series of questions, each of which has a single correct answer. The answers may not be known a priori. We assert that the measure of a contributorโ€™s judgments is the amount by which having these judgments decreases the entropy of our discovering the answer. This quantity is the pointwise mutual information between the judgments and the answer. The expected value of this metric is the mutual information between the contributor and the answer prior, which can be computed using only the prior and the conditional probabil- ities of the contributorโ€™s judgments given a correct answer, without knowing the answers themselves. We also propose using multivariable information measures, such as conditional mutual information, to measure the inter- actions between contributorsโ€™ judgments. These metrics have a variety of applications. They can be used as a basis for contributor performance evaluation and incentives. They can be used to measure the efficiency of the judgment collection process. If the collection process allows assignment of contributors to questions, they can also be used to optimize this scheduling.


Improving Forecasting Accuracy Using Bayesian Network Decomposition in Prediction Markets

AAAI Conferences

We propose to improve the accuracy of prediction market forecasts by using Bayesian networks to constrain probabilities among related questions. Prediction markets are already known to increase forecast accuracy compared to single best estimates. Our own flat prediction market substantially beat a baseline linear opinion pool during the first year. One way to improve performance is by expressing relationships among the questions. Elsewhere we describe work on combinatorial markets. Here we show how to use Bayesian networks within a flat market. The general approach is to decompose a target question (hypothesis) into a set of related variables (causal factors and evidence), when the relationship among the variables is known with some confidence. Then the marginal probabilities for the variables in the Bayes net are updated using the market estimates, with the Bayes net enforcing coherence. This paper describes the overall concept, shows the results for a particular model of the potential Greek exit from the European Union, and describes the teamโ€™s future research plan.


Integration of UMLS and MEDLINE in Unsupervised Word Sense Disambiguation

AAAI Conferences

Scarcity of training data for word sense disambiguation argues for the use of knowledge-based disambiguation methods, which rely on information available in terminological resources. Unfortunately, these resources are not generally optimized to perform word sense disambiguation. On the other hand, there are many examples of ambiguous biomedical words with context in MEDLINE. However, these examples of ambiguity are not labeled with their proper sense. We propose the integration of the UMLS and MEDLINE to create concept profiles which are used to perform knowledge-based word sense disambiguation. Our results show an accuracy of 0.8770 on a biomedical word sense disambiguation data set; this represents a statistically significant improvement over other knowledge-based methods based on the UMLS on this data set.


A Framework for Evaluating Approximation Methods for Gaussian Process Regression

arXiv.org Machine Learning

Gaussian process (GP) predictors are an important component of many Bayesian approaches to machine learning. However, even a straightforward implementation of Gaussian process regression (GPR) requires O(n^2) space and O(n^3) time for a dataset of n examples. Several approximation methods have been proposed, but there is a lack of understanding of the relative merits of the different approximations, and in what situations they are most useful. We recommend assessing the quality of the predictions obtained as a function of the compute time taken, and comparing to standard baselines (e.g., Subset of Data and FITC). We empirically investigate four different approximation algorithms on four different prediction problems, and make our code available to encourage future comparisons.


Soft (Gaussian CDE) regression models and loss functions

arXiv.org Machine Learning

Regression, unlike classification, has lacked a comprehensive and effective approach to deal with cost-sensitive problems by the reuse (and not a re-training) of general regression models. In this paper, a wide variety of cost-sensitive problems in regression (such as bids, asymmetric losses and rejection rules) can be solved effectively by a lightweight but powerful approach, consisting of: (1) the conversion of any traditional one-parameter crisp regression model into a two-parameter soft regression model, seen as a normal conditional density estimator, by the use of newly-introduced enrichment methods; and (2) the reframing of an enriched soft regression model to new contexts by an instance-dependent optimisation of the expected loss derived from the conditional normal distribution.


Transforming Graph Data for Statistical Relational Learning

Journal of Artificial Intelligence Research

Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of Statistical Relational Learning (SRL) algorithms to these domains. In this article, we examine and categorize techniques for transforming graph-based relational data to improve SRL algorithms. In particular, appropriate transformations of the nodes, links, and/or features of the data can dramatically affect the capabilities and results of SRL algorithms. We introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. More specifically, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed.


The Bayesian Bridge

arXiv.org Machine Learning

We propose the Bayesian bridge estimator for regularized regression and classification. Two key mixture representations for the Bayesian bridge model are developed: (1) a scale mixture of normals with respect to an alpha-stable random variable; and (2) a mixture of Bartlett--Fejer kernels (or triangle densities) with respect to a two-component mixture of gamma random variables. Both lead to MCMC methods for posterior simulation, and these methods turn out to have complementary domains of maximum efficiency. The first representation is a well known result due to West (1987), and is the better choice for collinear design matrices. The second representation is new, and is more efficient for orthogonal problems, largely because it avoids the need to deal with exponentially tilted stable random variables. It also provides insight into the multimodality of the joint posterior distribution, a feature of the bridge model that is notably absent under ridge or lasso-type priors. We prove a theorem that extends this representation to a wider class of densities representable as scale mixtures of betas, and provide an explicit inversion formula for the mixing distribution. The connections with slice sampling and scale mixtures of normals are explored. On the practical side, we find that the Bayesian bridge model outperforms its classical cousin in estimation and prediction across a variety of data sets, both simulated and real. We also show that the MCMC for fitting the bridge model exhibits excellent mixing properties, particularly for the global scale parameter. This makes for a favorable contrast with analogous MCMC algorithms for other sparse Bayesian models. All methods described in this paper are implemented in the R package BayesBridge. An extensive set of simulation results are provided in two supplemental files.


A Distance-Based Branch and Bound Feature Selection Algorithm

arXiv.org Machine Learning

There is no known efficient method for selecting k Gaussian features from n which achieve the lowest Bayesian classification error. We show an example of how greedy algorithms faced with this task are led to give results that are not optimal. This motivates us to propose a more robust approach. We present a Branch and Bound algorithm for finding a subset of k independent Gaussian features which minimizes the naive Bayesian classification error. Our algorithm uses additive monotonic distance measures to produce bounds for the Bayesian classification error in order to exclude many feature subsets from evaluation, while still returning an optimal solution. We test our method on synthetic data as well as data obtained from gene expression profiling.


Boltzmann Machine Learning with the Latent Maximum Entropy Principle

arXiv.org Machine Learning

We present a new statistical learning paradigm for Boltzmann machines based on a new inference principle we have proposed: the latent maximum entropy principle (LME). LME is different both from Jaynes maximum entropy principle and from standard maximum likelihood estimation.We demonstrate the LME principle BY deriving new algorithms for Boltzmann machine parameter estimation, and show how robust and fast new variant of the EM algorithm can be developed.Our experiments show that estimation based on LME generally yields better results than maximum likelihood estimation, particularly when inferring hidden units from small amounts of data.