AITopics | Genre

Collaborating Authors

Genre

Structured Priors for Structure Learning

Mansinghka, Vikash, Kemp, Charles, Griffiths, Thomas, Tenenbaum, Joshua

arXiv.org Artificial IntelligenceJun-27-2012

Traditional approaches to Bayes net structure learning typically assume little regularity in graph structure other than sparseness. However, in many cases, we expect more systematicity: variables in real-world systems often group into classes that predict the kinds of probabilistic dependencies they participate in. Here we capture this form of prior knowledge in a hierarchical Bayesian framework, and exploit it to enable structure learning and type discovery from small datasets. Specifically, we present a nonparametric generative model for directed acyclic graphs as a prior for Bayes net structure learning. Our model assumes that variables come in one or more classes and that the prior probability of an edge existing between two variables is a function only of their classes. We derive an MCMC algorithm for simultaneous inference of the number of classes, the class assignments of variables, and the Bayes net structure over variables. For several realistic, sparse datasets, we show that the bias towards systematicity of connections provided by our model yields more accurate learned networks than a traditional, uniform prior approach, and that the classes found by our model are appropriate.

artificial intelligence, knowledge, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1206.6852

Country: North America > United States (0.68)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Output Space Search for Structured Prediction

Doppa, Janardhan Rao, Fern, Alan, Tadepalli, Prasad

arXiv.org Artificial IntelligenceJun-27-2012

We consider a framework for structured prediction based on search in the space of complete structured outputs. Given a structured input, an output is produced by running a time-bounded search procedure guided by a learned cost function, and then returning the least cost output uncovered during the search. This framework can be instantiated for a wide range of search spaces and search procedures, and easily incorporates arbitrary structured-prediction loss functions. In this paper, we make two main technical contributions. First, we define the limited-discrepancy search space over structured outputs, which is able to leverage powerful classification learning algorithms to improve the search space quality. Second, we give a generic cost function learning approach, where the key idea is to learn a cost function that attempts to mimic the behavior of conducting searches guided by the true loss function. Our experiments on six benchmark domains demonstrate that using our framework with only a small amount of search is sufficient for significantly improving on state-of-the-art structured-prediction performance.

artificial intelligence, inductive learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1206.646

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Variable noise and dimensionality reduction for sparse Gaussian processes

Snelson, Edward, Ghahramani, Zoubin

arXiv.org Machine LearningJun-27-2012

The sparse pseudo-input Gaussian process (SPGP) is a new approximation method for speeding up GP regression in the case of a large number of data points N. The approximation is controlled by the gradient optimization of a small set of M `pseudo-inputs', thereby reducing complexity from N^3 to NM^2. One limitation of the SPGP is that this optimization space becomes impractically big for high dimensional data sets. This paper addresses this limitation by performing automatic dimensionality reduction. A projection of the input space to a low dimensional space is learned in a supervised manner, alongside the pseudo-inputs, which now live in this reduced space. The paper also investigates the suitability of the SPGP for modeling data with input-dependent noise. A further extension of the model is made to make it even more powerful in this regard - we learn an uncertainty parameter for each pseudo-input. The combination of sparsity, reduced dimension, and input-dependent noise makes it possible to apply GPs to much larger and more complex data sets than was previously practical. We demonstrate the benefits of these methods on several synthetic and real world problems.

artificial intelligence, machine learning, spgp, (17 more...)

arXiv.org Machine Learning

1206.6873

Genre: Research Report (0.64)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.62)

Add feedback

Ranking by Dependence - A Fair Criteria

Steck, Harald

arXiv.org Machine LearningJun-27-2012

Estimating the dependences between random variables, and ranking them accordingly, is a prevalent problem in machine learning. Pursuing frequentist and information-theoretic approaches, we first show that the p-value and the mutual information can fail even in simplistic situations. We then propose two conditions for regularizing an estimator of dependence, which leads to a simple yet effective new measure. We discuss its advantages and compare it to well-established model-selection criteria. Apart from that, we derive a simple constraint for regularizing parameter estimates in a graphical model. This results in an analytical approximation for the optimal value of the equivalent sample size, which agrees very well with the more involved Bayesian approach in our experiments.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

1206.6871

Genre:

Research Report > Experimental Study (0.38)
Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Bayesian Random Fields: The Bethe-Laplace Approximation

Welling, Max, Parise, Sridevi

arXiv.org Machine LearningJun-27-2012

While learning the maximum likelihood value of parameters of an undirected graphical model is hard, modelling the posterior distribution over parameters given data is harder. Yet, undirected models are ubiquitous in computer vision and text modelling (e.g. conditional random fields). But where Bayesian approaches for directed models have been very successful, a proper Bayesian treatment of undirected models in still in its infant stages. We propose a new method for approximating the posterior of the parameters given data based on the Laplace approximation. This approximation requires the computation of the covariance matrix over features which we compute using the linear response approximation based in turn on loopy belief propagation. We develop the theory for conditional and 'unconditional' random fields with or without hidden variables. In the conditional setting we introduce a new variant of bagging suitable for structured domains. Here we run the loopy max-product algorithm on a 'super-graph' composed of graphs for individual models sampled from the posterior and connected by constraints. Experiments on real world data validate the proposed methods.

approximation, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1206.6868

Country: North America > United States > California (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Bayesian Multicategory Support Vector Machines

Zhang, Zhihua, Jordan, Michael I.

arXiv.org Machine LearningJun-27-2012

We show that the multi-class support vector machine (MSVM) proposed by Lee et al. (2004) can be viewed as a MAP estimation procedure under an appropriate probabilistic interpretation of the classifier. We also show that this interpretation can be extended to a hierarchical Bayesian architecture and to a fully-Bayesian inference procedure for multiclass classification based on data augmentation. We present empirical results that show that the advantages of the Bayesian formalism are obtained without a loss in classification accuracy.

artificial intelligence, machine learning, msvm, (15 more...)

arXiv.org Machine Learning

1206.6863

Country: North America > United States > California (0.68)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

Predicting Conditional Quantiles via Reduction to Classification

Langford, John, Oliveira, Roberto, Zadrozny, Bianca

arXiv.org Machine LearningJun-27-2012

We show how to reduce the process of predicting general order statistics (and the median in particular) to solving classification. The accompanying theoretical statement shows that the regret of the classifier bounds the regret of the quantile regression under a quantile loss. We also test this reduction empirically against existing quantile regression methods on large real-world datasets and discover that it provides state-of-the-art performance.

artificial intelligence, machine learning, quantile regression, (15 more...)

arXiv.org Machine Learning

1206.686

Country: North America > United States > California (0.30)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

Faster Gaussian Summation: Theory and Experiment

Lee, Dongryeol, Gray, Alexander G.

arXiv.org Machine LearningJun-27-2012

We provide faster algorithms for the problem of Gaussian summation, which occurs in many machine learning methods. We develop two new extensions - an O(Dp) Taylor expansion for the Gaussian kernel with rigorous error bounds and a new error control scheme integrating any arbitrary approximation method - within the best discretealgorithmic framework using adaptive hierarchical data structures. We rigorously evaluate these techniques empirically in the context of optimal bandwidth selection in kernel density estimation, revealing the strengths and weaknesses of current state-of-the-art approaches for the first time. Our results demonstrate that the new error control scheme yields improved performance, whereas the series expansion approach is only effective in low dimensions (five or less).

artificial intelligence, expansion, machine learning, (16 more...)

arXiv.org Machine Learning

1206.6857

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.55)

Add feedback

Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation

Porteous, Ian, Ihler, Alexander T., Smyth, Padhraic, Welling, Max

arXiv.org Machine LearningJun-27-2012

Nonparametric Bayesian approaches to clustering, information retrieval, language modeling and object recognition have recently shown great promise as a new paradigm for unsupervised data analysis. Most contributions have focused on the Dirichlet process mixture models or extensions thereof for which efficient Gibbs samplers exist. In this paper we explore Gibbs samplers for infinite complexity mixture models in the stick breaking representation. The advantage of this representation is improved modeling flexibility. For instance, one can design the prior distribution over cluster sizes or couple multiple infinite mixture models (e.g. over time) at the level of their parameters (i.e. the dependent Dirichlet process model). However, Gibbs samplers for infinite mixture models (as recently introduced in the statistics literature) seem to mix poorly over cluster labels. Among others issues, this can have the adverse effect that labels for the same cluster in coupled mixture models are mixed up. We introduce additional moves in these samplers to improve mixing over cluster labels and to bring clusters into correspondence. An application to modeling of storm trajectories is used to illustrate these ideas.

artificial intelligence, machine learning, sampler, (18 more...)

arXiv.org Machine Learning

1206.6845

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Matrix Tile Analysis

Givoni, Inmar, Cheung, Vincent, Frey, Brendan J.

arXiv.org Machine LearningJun-27-2012

Many tasks require finding groups of elements in a matrix of numbers, symbols or class likelihoods. One approach is to use efficient bi- or tri-linear factorization techniques including PCA, ICA, sparse matrix factorization and plaid analysis. These techniques are not appropriate when addition and multiplication of matrix elements are not sensibly defined. More directly, methods like bi-clustering can be used to classify matrix elements, but these methods make the overly-restrictive assumption that the class of each element is a function of a row class and a column class. We introduce a general computational problem, `matrix tile analysis' (MTA), which consists of decomposing a matrix into a set of non-overlapping tiles, each of which is defined by a subset of usually nonadjacent rows and columns. MTA does not require an algebra for combining tiles, but must search over discrete combinations of tile assignments. Exact MTA is a computationally intractable integer programming problem, but we describe an approximate iterative technique and a computationally efficient sum-product relaxation of the integer program. We compare the effectiveness of these methods to PCA and plaid on hundreds of randomly generated tasks. Using double-gene-knockout data, we show that MTA finds groups of interacting yeast genes that have biologically-related functions.

artificial intelligence, machine learning, matrix, (18 more...)

arXiv.org Machine Learning

1206.6833

Country:

North America > United States (0.46)
North America > Canada > Ontario (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback