AITopics

An autonomous variational inference algorithm for arbitrary graphical models requires the ability to optimize variational approximations over the space of model parameters as well as over the choice of tractable families used for the variational approximation. In this paper, we present a novel combination of graph partitioning algorithms with a generalized mean field (GMF) inference algorithm. This combination optimizes over disjoint clustering of variables and performs inference using those clusters. We provide a formal analysis of the relationship between the graph cut and the GMF approximation, and explore several graph partition strategies empirically. Our empirical results provide rather clear support for a weighted version of MinCut as a useful clustering algorithm for GMF inference, which is consistent with the implications from the formal analysis.

algorithm, artificial intelligence, machine learning, (17 more...)

1207.4156

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Xiong, Xuejian, Chan, Kap, Tan, Kian Lee

Similarity-Driven Cluster Merging Method for Unsupervised Fuzzy Clustering

In this paper, a similarity-driven cluster merging method is proposed for unsuper-vised fuzzy clustering. The cluster merging method is used to resolve the problem of cluster validation. Starting with an overspecified number of clusters in the data, pairs of similar clusters are merged based on the proposed similarity-driven cluster merging criterion. The similarity between clusters is calculated by a fuzzy cluster similarity matrix, while an adaptive threshold is used for merging. In addition, a modified generalized ob- jective function is used for prototype-based fuzzy clustering. The function includes the p-norm distance measure as well as principal components of the clusters. The number of the principal components is determined automatically from the data being clustered. The properties of this unsupervised fuzzy clustering algorithm are illustrated by several experiments.

algorithm, artificial intelligence, machine learning, (17 more...)

1207.4155

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

An Extended Cencov-Campbell Characterization of Conditional Information Geometry

Lebanon, Guy

We formulate and prove an axiomatic characterization of conditional information geometry, for both the normalized and the nonnormalized cases. This characterization extends the axiomatic derivation of the Fisher geometry by Cencov and Campbell to the cone of positive conditional models, and as a special case to the manifold of conditional distributions. Due to the close connection between the conditional I-divergence and the product Fisher information metric the characterization provides a new axiomatic interpretation of the primal problems underlying logistic regression and AdaBoost.

artificial intelligence, machine learning, matrix, (16 more...)

1207.4139

Genre: Research Report (0.91)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Murray, Iain, Ghahramani, Zoubin

Bayesian Learning in Undirected Graphical Models: Approximate MCMC algorithms

Bayesian learning in undirected graphical models--computing posterior distributions over parameters and predictive quantities-- is exceptionally difficult. We conjecture that for general undirected models, there are no tractable MCMC (Markov Chain Monte Carlo) schemes giving the correct equilibrium distribution over parameters. While this intractability, due to the partition function, is familiar to those performing parameter optimisation, Bayesian learning of posterior distributions over undirected model parameters has been unexplored and poses novel challenges. We propose several approximate MCMC schemes and test on fully observed binary models (Boltzmann machines) for a small coronary heart disease data set and larger artificial systems. While approximations must perform well on the model, their interaction with the sampling scheme is also important. Samplers based on variational mean-field approximations generally performed poorly, more advanced methods using loopy propagation, brief sampling and stochastic dynamics lead to acceptable parameter posteriors. Finally, we demonstrate these techniques on a Markov random field with hidden variables.

approximation, artificial intelligence, machine learning, (17 more...)

1207.4134

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.75)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Altun, Yasemin, Smola, Alex, Hofmann, Thomas

Exponential Families for Conditional Random Fields

In this paper we define conditional random fields in reproducing kernel Hilbert spaces and show connections to Gaussian Process classification. More specifically, we prove decomposition results for undirected graphical models and we give constructions for kernels. Finally we present efficient means of solving the optimization problem using reduced rank decompositions and we show how stationarity can be exploited efficiently in the optimization process.

artificial intelligence, clique, machine learning, (15 more...)

1207.4131

Country: North America > United States (0.93)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Buntine, Wray L., Jakulin, Aleks

Applying Discrete PCA in Data Analysis

Methods for analysis of principal components in discrete data have existed for some time under various names such as grade of membership modelling, probabilistic latent semantic analysis, and genotype inference with admixture. In this paper we explore a number of extensions to the common theory, and present some application of these methods to some common statistical tasks. We show that these methods can be interpreted as a discrete version of ICA. We develop a hierarchical version yielding components at different levels of detail, and additional techniques for Gibbs sampling. We compare the algorithms on a text prediction task using support vector machines, and to information retrieval.

artificial intelligence, machine learning, natural language, (18 more...)

1207.4125

Country:

Europe (0.94)
North America > United States (0.93)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.94)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Gammerman, Alex, Kalnishkan, Yuri, Vovk, Vladimir

On-line Prediction with Kernels and the Complexity Approximation Principle

The paper describes an application of Aggregating Algorithm to the problem of regression. It generalizes earlier results concerned with plain linear regression to kernel techniques and presents an on-line algorithm which performs nearly as well as any oblivious kernel predictor. The paper contains the derivation of an estimate on the performance of this algorithm. The estimate is then used to derive an application of the Complexity Approximation Principle to kernel methods.

artificial intelligence, machine learning, prediction strategy, (14 more...)

1207.4113

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Globerson, Amir, Tishby, Naftali

The Minimum Information Principle for Discriminative Learning

Exponential models of distributions are widely used in machine learning for classiffication and modelling. It is well known that they can be interpreted as maximum entropy models under empirical expectation constraints. In this work, we argue that for classiffication tasks, mutual information is a more suitable information theoretic measure to be optimized. We show how the principle of minimum mutual information generalizes that of maximum entropy, and provides a comprehensive framework for building discriminative classiffiers. A game theoretic interpretation of our approach is then given, and several generalization bounds provided. We present iterative algorithms for solving the minimum information problem and its convex dual, and demonstrate their performance on various classiffication tasks. The results show that minimum information classiffiers outperform the corresponding maximum entropy models.

artificial intelligence, constraint, machine learning, (18 more...)

1207.411

Country: Asia > Middle East (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)

Yuan, Changhe, Lu, Tsai-Ching, Druzdzel, Marek J.

Annealed MAP

arXiv.org Artificial IntelligenceJul-11-2012

Maximum a Posteriori assignment (MAP) is the problem of finding the most probable instantiation of a set of variables given the partial evidence on the other variables in a Bayesian network. MAP has been shown to be a NP-hard problem [22], even for constrained networks, such as polytrees [18]. Hence, previous approaches often fail to yield any results for MAP problems in large complex Bayesian networks. To address this problem, we propose AnnealedMAP algorithm, a simulated annealing-based MAP algorithm. The AnnealedMAP algorithm simulates a non-homogeneous Markov chain whose invariant function is a probability density that concentrates itself on the modes of the target density. We tested this algorithm on several real Bayesian networks. The results show that, while maintaining good quality of the MAP solutions, the AnnealedMAP algorithm is also able to solve many problems that are beyond the reach of previous approaches.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1207.4153

Country: North America > United States > California > San Francisco County > San Francisco (0.29)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Boots, Byron, Gordon, Geoffrey J.

A Spectral Learning Approach to Range-Only SLAM

arXiv.org Machine LearningJul-10-2012

We present a novel spectral learning algorithm for simultaneous localization and mapping (SLAM) from range data with known correspondences. This algorithm is an instance of a general spectral system identification framework, from which it inherits several desirable properties, including statistical consistency and no local optima. Compared with popular batch optimization or multiple-hypothesis tracking (MHT) methods for range-only SLAM, our spectral approach offers guaranteed low computational requirements and good tracking performance. Compared with popular extended Kalman filter (EKF) or extended information filter (EIF) approaches, and many MHT ones, our approach does not need to linearize a transition or measurement model; such linearizations can cause severe errors in EKFs and EIFs, and to a lesser extent MHT, particularly for the highly non-Gaussian posteriors encountered in range-only SLAM. We provide a theoretical analysis of our method, including finite-sample error bounds. Finally, we demonstrate on a real-world robotic SLAM problem that our algorithm is not only theoretically justified, but works well in practice: in a comparison of multiple methods, the lowest errors come from a combination of our algorithm with batch optimization, but our method alone produces nearly as good a result at far lower computational cost. 1 Introduction In range-only SLAM, we are given a sequence of range measurements from a robot to fixed landmarks, and possibly a matching sequence of odometry measurements. We then attempt to simultaneously estimate the robot's trajectory and the locations of the landmarks. In all the above approaches, the most popular representation for a hypothesis is a list of landmark locations (m n,x,m n,y) and a list of robot poses (x t,y t,θ t) . Unfortunately, both the motion and measurement models are highly nonlinear in this representation, leading to computational problems: inaccurate linearizations in EKF/EIF/MHT and local optima in batch optimization approaches (see Section 2 for details). Much work has attempted to remedy this problem, e.g., by changing the hypothesis representation (Djugash, 2010) or by keeping multiple hypotheses (Djugash et al., 2005; Djugash, 2010; Thrun et al., 2005). While considerable progress has been made, none of these methods are ideal; common difficulties include the need for an extensive initialization phase, inability to recover from poor initialization, lack of performance guarantees, or excessive computational requirements. We take a very different approach: we formulate range-only SLAM as a matrix factorization problem, where features of observations are linearly related to a 4-or 7-dimensional state space.

algorithm, artificial intelligence, machine learning, (18 more...)

1207.2491

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)