AITopics | Technology

Clique Matrices for Statistical Graph Decomposition and Parameterising Restricted Positive Definite Matrices

Barber, David

We introduce Clique Matrices as an alternative representation of undirected graphs, being a generalisation of the incidence matrix representation. Here we use clique matrices to decompose a graph into a set of possibly overlapping clusters, de ned as well-connected subsets of vertices. The decomposition is based on a statistical description which encourages clusters to be well connected and few in number. Inference is carried out using a variational approximation. Clique matrices also play a natural role in parameterising positive de nite matrices under zero constraints on elements of the matrix. We show that clique matrices can parameterise all positive de nite matrices restricted according to a decomposable graph and form a structured Factor Analysis approximation in the non-decomposable case.

artificial intelligence, bayesian inference, machine learning, (17 more...)

1206.3237

Country:

North America > United States > New York (0.04)
Asia > Middle East > Iraq (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Laber, Eric B., Murphy, Susan A.

Small Sample Inference for Generalization Error in Classification Using the CUD Bound

Confidence measures for the generalization error are crucial when small training samples are used to construct classifiers. A common approach is to estimate the generalization error by resampling and then assume the resampled estimator follows a known distribution to form a confidence set [Kohavi 1995, Martin 1996,Yang 2006]. Alternatively, one might bootstrap the resampled estimator of the generalization error to form a confidence set. Unfortunately, these methods do not reliably provide sets of the desired confidence. The poor performance appears to be due to the lack of smoothness of the generalization error as a function of the learned classifier. This results in a non-normal distribution of the estimated generalization error. We construct a confidence set for the generalization error by use of a smooth upper bound on the deviation between the resampled estimate and generalization error. The confidence set is formed by bootstrapping this upper bound. In cases in which the approximation class for the classifier can be represented as a parametric additive model, we provide a computationally efficient algorithm. This method exhibits superior performance across a series of test and simulated data sets.

artificial intelligence, generalization error, machine learning, (16 more...)

1206.3274

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
Europe > Portugal > Braga > Braga (0.04)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Bayesian Out-Trees

Jebara, Tony S.

A Bayesian treatment of latent directed graph structure for non-iid data is provided where each child datum is sampled with a directed conditional dependence on a single unknown parent datum. The latent graph structure is assumed to lie in the family of directed out-tree graphs which leads to efficient Bayesian inference. The latent likelihood of the data and its gradients are computable in closed form via Tutte's directed matrix tree theorem using determinants and inverses of the out-Laplacian. This novel likelihood subsumes iid likelihood, is exchangeable and yields efficient unsupervised and semi-supervised learning algorithms. In addition to handling taxonomy and phylogenetic datasets the out-tree assumption performs surprisingly well as a semi-parametric density estimator on standard iid datasets. Experiments with unsupervised and semisupervised learning are shown on various UCI and taxonomy datasets.

artificial intelligence, likelihood, machine learning, (17 more...)

1206.3269

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.89)

Brunskill, Emma, Leffler, Bethany, Li, Lihong, Littman, Michael L., Roy, Nicholas

CORL: A Continuous-state Offset-dynamics Reinforcement Learner

Continuous state spaces and stochastic, switching dynamics characterize a number of rich, realworld domains, such as robot navigation across varying terrain. We describe a reinforcementlearning algorithm for learning in these domains and prove for certain environments the algorithm is probably approximately correct with a sample complexity that scales polynomially with the state-space dimension. Unfortunately, no optimal planning techniques exist in general for such problems; instead we use fitted value iteration to solve the learned MDP, and include the error due to approximate planning in our bounds. Finally, we report an experiment using a robotic car driving over varying terrain to demonstrate that these dynamics representations adequately capture real-world dynamics and that our algorithm can be used to efficiently solve such problems.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1206.3231

Country:

North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Learning Convex Inference of Marginals

Domke, Justin

Graphical models trained using maximum likelihood are a common tool for probabilistic inference of marginal distributions. However, this approach suffers difficulties when either the inference process or the model is approximate. In this paper, the inference process is first defined to be the minimization of a convex function, inspired by free energy approximations. Learning is then done directly in terms of the performance of the inference process at univariate marginal prediction. The main novelty is that this is a direct minimization of emperical risk, where the risk measures the accuracy of predicted marginals.

artificial intelligence, bayesian inference, machine learning, (18 more...)

1206.3247

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Maryland (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)

Ganapathi, Varun, Vickrey, David, Duchi, John, Koller, Daphne

Constrained Approximate Maximum Entropy Learning of Markov Random Fields

Parameter estimation in Markov random fields (MRFs) is a difficult task, in which inference over the network is run in the inner loop of a gradient descent procedure. Replacing exact inference with approximate methods such as loopy belief propagation (LBP) can suffer from poor convergence. In this paper, we provide a different approach for combining MRF learning and Bethe approximation. We consider the dual of maximum likelihood Markov network learning - maximizing entropy with moment matching constraints - and then approximate both the objective and the constraints in the resulting optimization problem. Unlike previous work along these lines (Teh & Welling, 2003), our formulation allows parameter sharing between features in a general log-linear model, parameter regularization and conditional training. We show that piecewise training (Sutton & McCallum, 2005) is a very restricted special case of this formulation. We study two optimization strategies: one based on a single convex approximation and one that uses repeated convex approximations. We show results on several real-world networks that demonstrate that these algorithms can significantly outperform learning with loopy and piecewise. Our results also provide a framework for analyzing the trade-offs of different relaxations of the entropy objective and of the constraints.

artificial intelligence, machine learning, objective, (16 more...)

1206.3257

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Cseke, Botond, Heskes, Tom

Bounds on the Bethe Free Energy for Gaussian Networks

We address the problem of computing approximate marginals in Gaussian probabilistic models by using mean field and fractional Bethe approximations. As an extension of Welling and Teh (2001), we define the Gaussian fractional Bethe free energy in terms of the moment parameters of the approximate marginals and derive an upper and lower bound for it. We give necessary conditions for the Gaussian fractional Bethe free energies to be bounded from below. It turns out that the bounding condition is the same as the pairwise normalizability condition derived by Malioutov et al. (2006) as a sufficient condition for the convergence of the message passing algorithm. By giving a counterexample, we disprove the conjecture in Welling and Teh (2001): even when the Bethe free energy is not bounded from below, it can possess a local minimum to which the minimization algorithms can converge.

artificial intelligence, bethe free energy, machine learning, (19 more...)

1206.3243

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Europe > Netherlands > Gelderland > Nijmegen (0.05)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Welling, Max, Teh, Yee Whye, Kappen, Hilbert

Hybrid Variational/Gibbs Collapsed Inference in Topic Models

Variational Bayesian inference and (collapsed) Gibbs sampling are the two important classes of inference algorithms for Bayesian networks. Both have their advantages and disadvantages: collapsed Gibbs sampling is unbiased but is also inefficient for large count values and requires averaging over many samples to reduce variance. On the other hand, variational Bayesian inference is efficient and accurate for large count values but suffers from bias for small counts. We propose a hybrid algorithm that combines the best of both worlds: it samples very small counts and applies variational updates to large counts. This hybridization is shown to significantly improve testset perplexity relative to variational inference at no computational cost.

artificial intelligence, bayesian inference, machine learning, (15 more...)

1206.3297

Country:

North America > United States > California > Orange County > Irvine (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Hoyer, Patrik O., Hyvarinen, Aapo, Scheines, Richard, Spirtes, Peter L., Ramsey, Joseph, Lacerda, Gustavo, Shimizu, Shohei

Causal discovery of linear acyclic models with arbitrary distributions

An important task in data analysis is the discovery of causal relationships between observed variables. For continuous-valued data, linear acyclic causal models are commonly used to model the data-generating process, and the inference of such models is a well-studied problem. However, existing methods have significant limitations. Methods based on conditional independencies (Spirtes et al. 1993; Pearl 2000) cannot distinguish between independence-equivalent models, whereas approaches purely based on Independent Component Analysis (Shimizu et al. 2006) are inapplicable to data which is partially Gaussian. In this paper, we generalize and combine the two approaches, to yield a method able to learn the model structure in many cases for which the previous methods provide answers that are either incorrect or are not as informative as possible. We give exact graphical conditions for when two distinct models represent the same family of distributions, and empirically demonstrate the power of our method through thorough simulations.

artificial intelligence, disturbance, machine learning, (17 more...)

1206.326

Country:

Europe > Finland > Uusimaa > Helsinki (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)

Genre: Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.47)