AITopics

We present a new method for calculating approximate marginals for probability distributions defined by graphs with cycles, based on a Gaussian entropybound combined with a semidefinite outer bound on the marginal polytope. This combination leads to a log-determinant maximization problemthat can be solved by efficient interior point methods [8]. As with the Bethe approximation and its generalizations [12], the optimizing arguments of this problem can be taken as approximations to the exact marginals. In contrast to Bethe/Kikuchi approaches, our variational problemis strictly convex and so has a unique global optimum. An additional desirable feature is that the value of the optimal solution is guaranteed to provide an upper bound on the log partition function. In experimental trials, the performance of the log-determinant relaxation is comparable to or better than the sum-product algorithm, and by a substantial marginfor certain problem classes. Finally, the zero-temperature limit of our log-determinant relaxation recovers a class of well-known semidefinite relaxations for integer programming [e.g., 3].

artificial intelligence, constraint, machine learning, (16 more...)

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Bach, Francis R., Jordan, Michael I.

Learning Spectral Clustering

Spectral clustering refers to a class of techniques which rely on the eigenstructure ofa similarity matrix to partition points into disjoint clusters with points in the same cluster having high similarity and points in different clustershaving low similarity. In this paper, we derive a new cost function for spectral clustering based on a measure of error between a given partition and a solution of the spectral relaxation of a minimum normalized cut problem. Minimizing this cost function with respect to the partition leads to a new spectral clustering algorithm. Minimizing with respect to the similarity matrix leads to an algorithm for learning the similarity matrix. We develop a tractable approximation of our cost function that is based on the power method of computing eigenvectors.

artificial intelligence, machine learning, matrix, (19 more...)

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Hoyle, David, Rattray, Magnus

Limiting Form of the Sample Covariance Eigenspectrum in PCA and Kernel PCA

We delive the limiting form of the eigenvalue spectrum for sample covariance matrices produced from non-isotropic data.

artificial intelligence, eigenvalue, machine learning, (15 more...)

Country: Europe (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Kauchak, David, Dasgupta, Sanjoy

An Iterative Improvement Procedure for Hierarchical Clustering

We describe a procedure which finds a hierarchical clustering by hillclimbing. Thecost function we use is a hierarchical extension of the k-means cost; our local moves are tree restructurings and node reorderings. Weshow these can be accomplished efficiently, by exploiting special properties of squared Euclidean distances and by using techniques from scheduling algorithms.

artificial intelligence, cost function, machine learning, (15 more...)

Country: North America > United States > California (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.74)

Shental, Noam, Bar-hillel, Aharon, Hertz, Tomer, Weinshall, Daphna

Computing Gaussian Mixture Models with EM Using Equivalence Constraints

Density estimation with Gaussian Mixture Models is a popular generative techniqueused also for clustering. We develop a framework to incorporate side information in the form of equivalence constraints into the model estimation procedure. Equivalence constraints are defined on pairs of data points, indicating whether the points arise from the same source (positive constraints) or from different sources (negative constraints). Suchconstraints can be gathered automatically in some learning problems, and are a natural form of supervision in others. For the estimation of model parameters we present a closed form EM procedure which handles positive constraints, and a Generalized EM procedure using aMarkov net which handles negative constraints. Using publicly available data sets we demonstrate that such side information can lead to considerable improvement in clustering tasks, and that our algorithm is preferable to two other suggested methods using the same type of side information.

artificial intelligence, constraint, machine learning, (16 more...)

Country: Asia > Middle East > Israel (0.16)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

Snelson, Edward, Ghahramani, Zoubin, Rasmussen, Carl E.

Warped Gaussian Processes

This allows for non-Gaussian processes and non-Gaussian noise. The learning algorithm choosesa nonlinear transformation such that transformed data is well-modelled by a GP. This can be seen as including a preprocessing transformation as an integral part of the probabilistic modelling problem, rather than as an ad-hoc step. We demonstrate on several real regression problems that learning the transformation can lead to significantly better performance than using a regular GP, or a GP with a fixed transformation.

artificial intelligence, machine learning, modeling & simulation, (17 more...)

Country:

Europe > Germany (0.28)
North America > Canada > Ontario > Toronto (0.14)

Technology:

Information Technology > Modeling & Simulation (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Strohmann, Thomas R., Belitski, Andrei, Grudic, Gregory Z., DeCoste, Dennis

Sparse Greedy Minimax Probability Machine Classification

The Minimax Probability Machine Classification (MPMC) framework [Lanckriet et al., 2002] builds classifiers by minimizing the maximum probability of misclassification, and gives direct estimates of the probabilistic accuracybound Ω. The only assumptions that MPMC makes is that good estimates of means and covariance matrices of the classes exist. However, as with Support Vector Machines, MPMC is computationally expensive and requires extensive cross validation experiments to choose kernels and kernel parameters that give good performance. In this paper we address the computational cost of MPMC by proposing an algorithm that constructs nonlinear sparse MPMC (SMPMC) models by incrementally addingbasis functions (i.e.

artificial intelligence, machine learning, mpmc, (17 more...)

Country:

North America > United States > Colorado (0.16)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.48)

Welling, Max, Mnih, Andriy, Hinton, Geoffrey E.

Wormholes Improve Contrastive Divergence

In models that define probabilities via energies, maximum likelihood learning typically involves using Markov Chain Monte Carlo to sample from the model's distribution. If the Markov chain is started at the data distribution, learning often works well even if the chain is only run for a few time steps [3]. But if the data distribution contains modes separated by regions of very low density, brief MCMC will not ensure that different modes have the correct relative energies because it cannot move particles from one mode to another. We show how to improve brief MCMC by allowing long-range moves that are suggested by the data distribution. If the model is approximately correct, these long-range moves have a reasonable acceptance rate.

artificial intelligence, machine learning, wormhole, (18 more...)

Country: North America > Canada > Ontario > Toronto (0.15)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.78)

liu, Ting, Moore, Andrew W., Gray, Alexander

New Algorithms for Efficient High Dimensional Non-parametric Classification

This paper is about non-approximate acceleration of high dimensional nonparametric operations such as k nearest neighbor classifiers and the prediction phase of Support Vector Machine classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the datapoints close to the query, but merely need to ask questions about the properties about that set of datapoints. This offers a small amount of computational leeway, andwe investigate how much that leeway can be exploited. For clarity, this paper concentrates on pure k-NN classification and the prediction phaseof SVMs. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN.

artificial intelligence, machine learning, node, (14 more...)

Country:

North America > United States > California (0.28)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.15)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.89)

Bagnell, J. A., Kakade, Sham M., Schneider, Jeff G., Ng, Andrew Y.

Policy Search by Dynamic Programming

We consider the policy search approach to reinforcement learning. We show that if a "baseline distribution" is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a finite number of steps, and for which we can provide nontrivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem.

algorithm, artificial intelligence, machine learning, (17 more...)

Country: North America > United States > Pennsylvania (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)