Uncertainty
Selective Sampling of Labelers for Approximating the Crowd
Ertekin, Seyda (Massachusetts Institute of Technology) | Hirsh, Haym (Rutgers University) | Rudin, Cynthia (Massachusetts Institute of Technology)
In this paper, we present CrowdSense, an algorithm for estimating the crowdโs majority opinion by querying only a subset of it. CrowdSense works in an online fashion where examples come one at a time and it dynamically samples subsets of labelers based on an exploration/exploitation criterion. The algorithm produces a weighted combination of a subset of the labelersโ votes that approximates the crowdโs opinion. We also present two probabilistic variants of CrowdSense that are based on different assumptions on the joint probability distribution between the labelersโ votes and the majority vote. Our experiments demonstrate that we can reliably approximate the entire crowdโs vote by collecting opinions from a representative subset of the crowd.
Generalized Weighted Model Counting: An Efficient Monte-Carlo Meta-Algorithm
Xia, Lirong (Harvard University)
In this paper, we focus on computing the prices of secu- rities represented by logical formulas in combinatorial prediction markets when the price function is represented by a Bayesian network. This problem turns out to be a natural extension of the weighted model counting (WMC) problem (Sang, Bearne, and Kautz 2005), which we call generalized weighted model counting (GWMC) problem. In GWMC, we are given a logical formula F and a polynomial-time computable weight function. We are asked to compute the total weight of the valuations that satisfy F. Based on importance sampling, we propose a Monte-Carlo meta-algorithm that has a good theoretical guarantee for formulas in disjunctive normal form (DNF). The meta-algorithm queries an oracle algorithm that computes marginal probabilities in Bayesian networks, and has the following theoretical guarantee. When the weight function can be approximately represented by a Bayesian network for which the oracle algorithm runs in polynomial time, our meta-algorithm becomes a fully polynomial-time randomized approximation scheme (FPRAS).
An Information-Theoretic Metric for Collective Human Judgment
Waterhouse, Tamsyn Peronel (Google)
We consider the problem of evaluating the performance of human contributors for tasks involving answering a series of questions, each of which has a single correct answer. The answers may not be known a priori. We assert that the measure of a contributorโs judgments is the amount by which having these judgments decreases the entropy of our discovering the answer. This quantity is the pointwise mutual information between the judgments and the answer. The expected value of this metric is the mutual information between the contributor and the answer prior, which can be computed using only the prior and the conditional probabil- ities of the contributorโs judgments given a correct answer, without knowing the answers themselves. We also propose using multivariable information measures, such as conditional mutual information, to measure the inter- actions between contributorsโ judgments. These metrics have a variety of applications. They can be used as a basis for contributor performance evaluation and incentives. They can be used to measure the efficiency of the judgment collection process. If the collection process allows assignment of contributors to questions, they can also be used to optimize this scheduling.
Improving Forecasting Accuracy Using Bayesian Network Decomposition in Prediction Markets
Berea, Anamaria (George Mason University) | Maxwell, Daniel (George Mason University) | Twardy, Charles (George Mason University)
We propose to improve the accuracy of prediction market forecasts by using Bayesian networks to constrain probabilities among related questions. Prediction markets are already known to increase forecast accuracy compared to single best estimates. Our own flat prediction market substantially beat a baseline linear opinion pool during the first year. One way to improve performance is by expressing relationships among the questions. Elsewhere we describe work on combinatorial markets. Here we show how to use Bayesian networks within a flat market. The general approach is to decompose a target question (hypothesis) into a set of related variables (causal factors and evidence), when the relationship among the variables is known with some confidence. Then the marginal probabilities for the variables in the Bayes net are updated using the market estimates, with the Bayes net enforcing coherence. This paper describes the overall concept, shows the results for a particular model of the potential Greek exit from the European Union, and describes the teamโs future research plan.
A Framework for Evaluating Approximation Methods for Gaussian Process Regression
Chalupka, Krzysztof, Williams, Christopher K. I., Murray, Iain
Gaussian process (GP) predictors are an important component of many Bayesian approaches to machine learning. However, even a straightforward implementation of Gaussian process regression (GPR) requires O(n^2) space and O(n^3) time for a dataset of n examples. Several approximation methods have been proposed, but there is a lack of understanding of the relative merits of the different approximations, and in what situations they are most useful. We recommend assessing the quality of the predictions obtained as a function of the compute time taken, and comparing to standard baselines (e.g., Subset of Data and FITC). We empirically investigate four different approximation algorithms on four different prediction problems, and make our code available to encourage future comparisons.
Transforming Graph Data for Statistical Relational Learning
Rossi, R. A., McDowell, L. K., Aha, D. W., Neville, J.
Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of Statistical Relational Learning (SRL) algorithms to these domains. In this article, we examine and categorize techniques for transforming graph-based relational data to improve SRL algorithms. In particular, appropriate transformations of the nodes, links, and/or features of the data can dramatically affect the capabilities and results of SRL algorithms. We introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. More specifically, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed.
The Bayesian Bridge
Polson, Nicholas G., Scott, James G., Windle, Jesse
We propose the Bayesian bridge estimator for regularized regression and classification. Two key mixture representations for the Bayesian bridge model are developed: (1) a scale mixture of normals with respect to an alpha-stable random variable; and (2) a mixture of Bartlett--Fejer kernels (or triangle densities) with respect to a two-component mixture of gamma random variables. Both lead to MCMC methods for posterior simulation, and these methods turn out to have complementary domains of maximum efficiency. The first representation is a well known result due to West (1987), and is the better choice for collinear design matrices. The second representation is new, and is more efficient for orthogonal problems, largely because it avoids the need to deal with exponentially tilted stable random variables. It also provides insight into the multimodality of the joint posterior distribution, a feature of the bridge model that is notably absent under ridge or lasso-type priors. We prove a theorem that extends this representation to a wider class of densities representable as scale mixtures of betas, and provide an explicit inversion formula for the mixing distribution. The connections with slice sampling and scale mixtures of normals are explored. On the practical side, we find that the Bayesian bridge model outperforms its classical cousin in estimation and prediction across a variety of data sets, both simulated and real. We also show that the MCMC for fitting the bridge model exhibits excellent mixing properties, particularly for the global scale parameter. This makes for a favorable contrast with analogous MCMC algorithms for other sparse Bayesian models. All methods described in this paper are implemented in the R package BayesBridge. An extensive set of simulation results are provided in two supplemental files.
Characteristic of partition-circuit matroid through approximation number
Rough set theory is a useful tool to deal with uncertain, granular and incomplete knowledge in information systems. And it is based on equivalence relations or partitions. Matroid theory is a structure that generalizes linear independence in vector spaces, and has a variety of applications in many fields. In this paper, we propose a new type of matroids, namely, partition-circuit matroids, which are induced by partitions. Firstly, a partition satisfies circuit axioms in matroid theory, then it can induce a matroid which is called a partition-circuit matroid. A partition and an equivalence relation on the same universe are one-to-one corresponding, then some characteristics of partition-circuit matroids are studied through rough sets. Secondly, similar to the upper approximation number which is proposed by Wang and Zhu, we define the lower approximation number. Some characteristics of partition-circuit matroids and the dual matroids of them are investigated through the lower approximation number and the upper approximation number. Keywords: Rough set; Matroid; Partition-circuit matroid; Lower approximation number; Upper approximation number.
A Generalized Mean Field Algorithm for Variational Inference in Exponential Families
Xing, Eric P., Jordan, Michael I., Russell, Stuart
The mean field methods, which entail approximating intractable probability distributions variationally with distributions from a tractable family, enjoy high efficiency, guaranteed convergence, and provide lower bounds on the true likelihood. But due to requirement for model-specific derivation of the optimization equations and unclear inference quality in various models, it is not widely used as a generic approximate inference algorithm. In this paper, we discuss a generalized mean field theory on variational approximation to a broad class of intractable distributions using a rich set of tractable distributions via constrained optimization over distribution spaces. We present a class of generalized mean field (GMF) algorithms for approximate inference in complex exponential family models, which entails limiting the optimization over the class of cluster-factorizable distributions. GMF is a generic method requiring no model-specific derivations. It factors a complex model into a set of disjoint variable clusters, and uses a set of canonical fix-point equations to iteratively update the cluster distributions, and converge to locally optimal cluster marginals that preserve the original dependency structure within each cluster, hence, fully decomposed the overall inference problem. We empirically analyzed the effect of different tractable family (clusters of different granularity) on inference quality, and compared GMF with BP on several canonical models. Possible extension to higher-order MF approximation is also discussed.
Boltzmann Machine Learning with the Latent Maximum Entropy Principle
Wang, Shaojun, Schuurmans, Dale, Peng, Fuchun, Zhao, Yunxin
We present a new statistical learning paradigm for Boltzmann machines based on a new inference principle we have proposed: the latent maximum entropy principle (LME). LME is different both from Jaynes maximum entropy principle and from standard maximum likelihood estimation.We demonstrate the LME principle BY deriving new algorithms for Boltzmann machine parameter estimation, and show how robust and fast new variant of the EM algorithm can be developed.Our experiments show that estimation based on LME generally yields better results than maximum likelihood estimation, particularly when inferring hidden units from small amounts of data.