AITopics

We consider the problem of learning a Riemannian metric associated with a given differentiable manifold and a set of points. Our approach to the problem involves choosing a metric from a parametric family that is based on maximizing the inverse volume of a given dataset of points. From a statistical perspective, it is related to maximum likelihood under a model that assigns probabilities inversely proportional to the Riemannian volume element. We discuss in detail learning a metric on the multinomial simplex where the metric candidates are pullback metrics of the Fisher information under a continuous group of transformations. When applied to documents, the resulting geodesics resemble, but outperform, the TFIDF cosine similarity measure in classification.

artificial intelligence, machine learning, natural language, (18 more...)

1212.2474

Country: Asia > Middle East (0.16)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Drton, Mathias, Richardson, Thomas S.

A New Algorithm for Maximum Likelihood Estimation in Gaussian Graphical Models for Marginal Independence

Graphical models with bi-directed edges (<->) represent marginal independence: the absence of an edge between two vertices indicates that the corresponding variables are marginally independent. In this paper, we consider maximum likelihood estimation in the case of continuous variables with a Gaussian joint distribution, sometimes termed a covariance graph model. We present a new fitting algorithm which exploits standard regression techniques and establish its convergence properties. Moreover, we contrast our procedure to existing estimation methods.

artificial intelligence, bayesian inference, machine learning, (14 more...)

1212.2462

Country: North America > United States > Washington > King County > Seattle (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Elidan, Gal, Friedman, Nir

The Information Bottleneck EM Algorithm

Learning with hidden variables is a central challenge in probabilistic graphical models that has important implications for many real-life problems. The classical approach is using the Expectation Maximization (EM) algorithm. This algorithm, however, can get trapped in local maxima. In this paper we explore a new approach that is based on the Information Bottleneck principle. In this approach, we view the learning problem as a tradeoff between two information theoretic objectives. The first is to make the hidden variables uninformative about the identity of specific instances. The second is to make the hidden variables informative about the observed attributes. By exploring different tradeoffs between these two objectives, we can gradually converge on a high-scoring solution. As we show, the resulting, Information Bottleneck Expectation Maximization (IB-EM) algorithm, manages to find solutions that are superior to standard EM methods.

artificial intelligence, bayesian inference, machine learning, (20 more...)

1212.246

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Bishop, Christopher M., Svensen, Markus

Bayesian Hierarchical Mixtures of Experts

The Hierarchical Mixture of Experts (HME) is a well-known tree-based model for regression and classification, based on soft probabilistic splits. In its original formulation it was trained by maximum likelihood, and is therefore prone to over-fitting. Furthermore the maximum likelihood framework offers no natural metric for optimizing the complexity and structure of the tree. Previous attempts to provide a Bayesian treatment of the HME model have relied either on ad-hoc local Gaussian approximations or have dealt with related models representing the joint distribution of both input and output variables. In this paper we describe a fully Bayesian treatment of the HME model based on variational inference. By combining local and global variational methods we obtain a rigourous lower bound on the marginal probability of the data under the model. This bound is optimized during the training phase, and its resulting value can be used for model order selection. We present results using this approach for a data set describing robot arm kinematics.

artificial intelligence, hme model, machine learning, (17 more...)

1212.2447

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Rosencrantz, Matthew, Gordon, Geoffrey, Thrun, Sebastian

Decentralized Sensor Fusion With Distributed Particle Filters

arXiv.org Artificial IntelligenceOct-19-2012

This paper presents a scalable Bayesian technique for decentralized state estimation from multiple platforms in dynamic environments. As has long been recognized, centralized architectures impose severe scaling limitations for distributed systems due to the enormous communication overheads. We propose a strictly decentralized approach in which only nearby platforms exchange information. They do so through an interactive communication protocol aimed at maximizing information flow. Our approach is evaluated in the context of a distributed surveillance scenario that arises in a robotic system for playing the game of laser tag. Our results, both from simulation and using physical robots, illustrate an unprecedented scaling capability to large teams of vehicles.

artificial intelligence, machine learning, robot, (17 more...)

arXiv.org Artificial Intelligence

1212.2493

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

arXiv.org Machine LearningOct-18-2012

Least Absolute Gradient Selector: Statistical Regression via Pseudo-Hard Thresholding

Yang, Kun

Variable selection in linear models plays a pivotal role in modern statistics. Hard-thresholding methods such as $l_0$ regularization are theoretically ideal but computationally infeasible. In this paper, we propose a new approach, called the LAGS, short for "least absulute gradient selector", to this challenging yet interesting problem by mimicking the discrete selection process of $l_0$ regularization. To estimate $\beta$ under the influence of noise, we consider, nevertheless, the following convex program [\hat{\beta} = \textrm{arg min}\frac{1}{n}\|X^{T}(y - X\beta)\|_1 + \lambda_n\sum_{i = 1}^pw_i(y;X;n)|\beta_i|] $\lambda_n > 0$ controls the sparsity and $w_i > 0$ dependent on $y, X$ and $n$ is the weights on different $\beta_i$; $n$ is the sample size. Surprisingly, we shall show in the paper, both geometrically and analytically, that LAGS enjoys two attractive properties: (1) LAGS demonstrates discrete selection behavior and hard thresholding property as $l_0$ regularization by strategically chosen $w_i$, we call this property "pseudo-hard thresholding"; (2) Asymptotically, LAGS is consistent and capable of discovering the true model; nonasymptotically, LAGS is capable of identifying the sparsity in the model and the prediction error of the coefficients is bounded at the noise level up to a logarithmic factor---$\log p$, where $p$ is the number of predictors. Computationally, LAGS can be solved efficiently by convex program routines for its convexity or by simplex algorithm after recasting it into a linear program. The numeric simulation shows that LAGS is superior compared to soft-thresholding methods in terms of mean squared error and parsimony of the model.

artificial intelligence, machine learning, predictor, (18 more...)

1204.2353

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.46)
Health & Medicine > Therapeutic Area > Urology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Machine LearningOct-17-2012

Mixture model for designs in high dimensional regression and the LASSO

Chrétien, Stéphane

The LASSO is a recent technique for variable selection in the regression model \bean y & = & X\beta +\epsilon, \eean where $X\in \R^{n\times p}$ and $\epsilon$ is a centered gaussian i.i.d. noise vector $\mathcal N(0,\sigma^2I)$. The LASSO has been proved to perform exact support recovery for regression vectors when the design matrix satisfies certain algebraic conditions and $\beta$ is sufficiently sparse. Estimation of the vector $X\beta$ has also extensively been studied for the purpose of prediction under the same algebraic conditions on $X$ and under sufficient sparsity of $\beta$. Among many other, the coherence is an index which can be used to study these nice properties of the LASSO. More precisely, a small coherence implies that most sparse vectors, with less nonzero components than the order $n/\log(p)$, can be recovered with high probability if its nonzero components are larger than the order $\sigma \sqrt{\log(p)}$. However, many matrices occuring in practice do not have a small coherence and thus, most results which have appeared in the litterature cannot be applied. The goal of this paper is to study a model for which precise results can be obtained. In the proposed model, the columns of the design matrix are drawn from a Gaussian mixture model and the coherence condition is imposed on the much smaller matrix whose columns are the mixture's centers, instead of on $X$ itself. Our main theorem states that $X\beta$ is as well estimated as in the case of small coherence up to a correction parametrized by the maximal variance in the mixture model.

artificial intelligence, inequality, machine learning, (14 more...)

1210.4762

Country: Europe (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

arXiv.org Machine LearningOct-16-2012

Lifted Relational Variational Inference

Choi, Jaesik, Amir, Eyal

Hybrid continuous-discrete models naturally represent many real-world applications in robotics, finance, and environmental engineering. Inference with large-scale models is challenging because relational structures deteriorate rapidly during inference with observations. The main contribution of this paper is an efficient relational variational inference algorithm that factors largescale probability models into simpler variational models, composed of mixtures of iid (Bernoulli) random variables. The algorithm takes probability relational models of largescale hybrid systems and converts them to a close-to-optimal variational models. Then, it efficiently calculates marginal probabilities on the variational models by using a latent (or lifted) variable elimination or a lifted stochastic sampling. This inference is unique because it maintains the relational structure upon individual observations and during inference steps.

algorithm, miid, probability, (14 more...)

1210.4867

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Amizadeh, Saeed, Thiesson, Bo, Hauskrecht, Milos

Variational Dual-Tree Framework for Large-Scale Transition Matrix Approximation

arXiv.org Machine LearningOct-16-2012

In recent years, non-parametric methods utilizing random walks on graphs have been used to solve a wide range of machine learning problems, but in their simplest form they do not scale well due to the quadratic complexity. In this paper, a new dual-tree based variational approach for approximating the transition matrix and efficiently performing the random walk is proposed. The approach exploits a connection between kernel density estimation, mixture modeling, and random walk on graphs in an optimization of the transition matrix for the data graph that ties together edge transitions probabilities that are similar. Compared to the de facto standard approximation method based on k-nearestneighbors, we demonstrate order of magnitudes speedup without sacrificing accuracy for Label Propagation tasks on benchmark data sets in semi-supervised learning.

graph, partition, refinement, (16 more...)

1210.4846

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Washington > King County > Redmond (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre:

Research Report (0.50)
Instructional Material (0.46)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Machine LearningOct-16-2012

Fast Graph Construction Using Auction Algorithm

Wang, Jun, Xia, Yinglong

In practical machine learning systems, graph based data representation has been widely used in various learning paradigms, ranging from unsupervised clustering to supervised classification. Besides those applications with natural graph or network structure data, such as social network analysis and relational learning, many other applications often involve a critical step in converting data vectors to an adjacency graph. In particular, a sparse subgraph extracted from the original graph is often required due to both theoretic and practical needs. Previous study clearly shows that the performance of different learning algorithms, e.g., clustering and classification, benefits from such sparse subgraphs with balanced node connectivity. However, the existing graph construction methods are either computationally expensive or with unsatisfactory performance. In this paper, we utilize a scalable method called auction algorithm and its parallel extension to recover a sparse yet nearly balanced subgraph with significantly reduced computational cost. Empirical study and comparison with the stateof-art approaches clearly demonstrate the superiority of the proposed method in both efficiency and accuracy.

artificial intelligence, graph, machine learning, (14 more...)

1210.4917

Country:

North America > United States (0.46)
North America > Canada (0.28)

Genre: Research Report (1.00)

Industry: Information Technology (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)