AITopics

The bootstrap has become a popular method for exploring model (structure) uncertainty. Our experiments with artificial and realworld datademonstrate that the graphs learned from bootstrap samples can be severely biased towards too complex graphical models. Accountingfor this bias is hence essential, e.g., when exploring model uncertainty. We find that this bias is intimately tied to (well-known) spurious dependences induced by the bootstrap. The leading-order bias-correction equals one half of Akaike's penalty for model complexity. We demonstrate the effect of this simple bias-correction in our experiments. We also relate this bias to the bias of the plugin estimator for entropy, as well as to the difference betweenthe expected test and training errors of a graphical model, which asymptotically equals Akaike's penalty (rather than one half).

artificial intelligence, bayesian inference, machine learning, (20 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Eskin, Eleazar, Smola, Alex J., Vishwanathan, S.v.n.

Laplace Propagation

We present a novel method for approximate inference in Bayesian models andregularized risk functionals. It is based on the propagation of mean and variance derived from the Laplace approximation of conditional probabilitiesin factorizing distributions, much akin to Minka's Expectation Propagation. In the jointly normal case, it coincides with the latter and belief propagation, whereas in the general case, it provides an optimization strategy containing Support Vector chunking, the Bayes Committee Machine, and Gaussian Process chunking as special cases.

approximation, artificial intelligence, machine learning, (18 more...)

Country: Asia > Middle East > Israel (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Morris, Quaid D., Frey, Brendan J.

Denoising and Untangling Graphs Using Degree Priors

This paper addresses the problem of untangling hidden graphs from a set of noisy detections of undirected edges. We present a model of the generation of the observed graph that includes degree-based structure priors on the hidden graphs. Exact inference in the model is intractable; we present an efficient approximate inference algorithm tocompute edge appearance posteriors. We evaluate our model and algorithm on a biological graph inference problem.

artificial intelligence, graph, machine learning, (19 more...)

Country: North America > Canada > Ontario > Toronto (0.15)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Beygelzimer, Alina, Rish, Irina

Approximability of Probability Distributions

We consider the question of how well a given distribution can be approximated withprobabilistic graphical models. We introduce a new parameter, effective treewidth, that captures the degree of approximability as a tradeoff between the accuracy and the complexity of approximation. We present a simple approach to analyzing achievable tradeoffs that exploits thethreshold behavior of monotone graph properties, and provide experimental results that support the approach.

artificial intelligence, graph, machine learning, (19 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Heskes, Tom, Zoeter, Onno, Wiegerinck, Wim

Approximate Expectation Maximization

The E-step boils down to computing probabilities of the hidden variables given the observed variables (evidence) and current set of parameters. The M-step then, given these probabilities, yields a new set of parameters guaranteed to increase the likelihood. In Bayesian networks, that will be the focus of this article, the M-step is usually relatively straightforward. A complication may arise in the E-step, when computing the probability of the hidden variables given the evidence becomes intractable. An often used approach is to replace the exact yet intractable inference in the E step with approximate inference, either through sampling or using a deterministic variational method. The use of a "mean-field" variational method in this context leads to an algorithm known as variational EM and can be given theinterpretation of minimizing a free energy with respect to both a tractable approximate distribution (approximate E-step) and the parameters (M-step) [2]. Loopy belief propagation [3] and variants thereof, such as generalized belief propagation [4]and expectation propagation [5], have become popular alternatives to the "mean-field" variational approaches, often yielding somewhat better approximations. Andindeed, they can and have been applied for approximate inference in the E-step of the EM algorithm (see e.g.

algorithm, artificial intelligence, machine learning, (18 more...)

Country:

Europe > Netherlands (0.28)
North America > United States > California > San Francisco County > San Francisco (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.90)

Paciorek, Christopher J., Schervish, Mark J.

Nonstationary Covariance Functions for Gaussian Process Regression

While the mixture approach is intriguing, neither of [8, 9] compare their model to other methods.

artificial intelligence, bayesian inference, machine learning, (17 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.28)
Europe > United Kingdom > England (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Kemp, Charles, Griffiths, Thomas L., Stromsten, Sean, Tenenbaum, Joshua B.

Semi-Supervised Learning with Trees

We describe a nonparametric Bayesian approach to generalizing from few labeled examples, guided by a larger set of unlabeled objects and the assumption of a latent tree-structure to the domain. The tree (or a distribution over trees) may be inferred using the unlabeled data. A prior over concepts generated by a mutation process on the inferred tree(s) allows efficient computation of the optimal Bayesian classification function fromthe labeled examples. We test our approach on eight real-world datasets.

artificial intelligence, bayesian inference, machine learning, (16 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Palmer, Jason, Rao, Bhaskar D., Wipf, David P.

Perspectives on Sparse Bayesian Learning

Recently, relevance vector machines (RVM) have been fashioned from a sparse Bayesian learning (SBL) framework to perform supervised learning usinga weight prior that encourages sparsity of representation. The methodology incorporates an additional set of hyperparameters governing theprior, one for each weight, and then adopts a specific approximation tothe full marginalization over all weights and hyperparameters. Despite its empirical success however, no rigorous motivation for this particular approximation is currently available. To address this issue, we demonstrate that SBL can be recast as the application of a rigorous variational approximationto the full model by expressing the prior in a dual form. This formulation obviates the necessity of assuming any hyperpriors andleads to natural, intuitive explanations of why sparsity is achieved in practice.

approximation, artificial intelligence, machine learning, (15 more...)

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.72)

Welling, Max, Williams, Christopher, Agakov, Felix V.

Extreme Components Analysis

Principal components analysis (PCA) is one of the most widely used techniques in machine learning and data mining. Minor components analysis (MCA) is less well known, but can also play an important role in the presence of constraints on the data distribution. In this paper we present a probabilistic model for "extreme components analysis" (XCA) which at the maximum likelihood solution extracts an optimal combination ofprincipal and minor components. For a given number of components, thelog-likelihood of the XCA model is guaranteed to be larger or equal than that of the probabilistic models for PCA and MCA. We describe anefficient algorithm to solve for the globally optimal solution. For log-convex spectra we prove that the solution consists of principal components only, while for log-concave spectra the solution consists of minor components. In general, the solution admits a combination of both. In experiments we explore the properties of XCA on some synthetic and real-world datasets.

artificial intelligence, eigenvalue, machine learning, (16 more...)