AITopics

In this paper we show that the kernel peA algorithm of Sch6lkopf et al (1998) can be interpreted as a form of metric multidimensional scaling (MDS) when the kernel function k(x, y) is isotropic, i.e. it depends only on Ilx - yll. This leads to a metric MDS algorithm where the desired configuration of points is found via the solution of an eigenproblem rather than through the iterative optimization of the stress objective function. The question of kernel choice is also discussed.

artificial intelligence, dissimilarity, machine learning, (16 more...)

Country: North America > United States (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Using the Nyström Method to Speed Up Kernel Machines

Williams, Christopher K. I., Seeger, Matthias

Abstract Missing

artificial intelligence, kernel machine, nystrom method, (1 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.40)

Using the Nyström Method to Speed Up Kernel Machines

Williams, Christopher K. I., Seeger, Matthias

K Kn,mK,;'1me,n, where Kmm is the n x m block of the original matrix K,

approximation, artificial intelligence, machine learning, (17 more...)

Country:

North America > United States (0.47)
Europe > United Kingdom > England (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

On a Connection between Kernel PCA and Metric Multidimensional Scaling

Williams, Christopher K. I.

This leads to a metric MDS algorithm where the desired configuration of points is found via the solution of an eigenproblem rather than through the iterative optimization of the stress objective function. The question of kernel choice is also discussed. 1 Introduction Suppose we are given n objects, and for each pair (i,j) we have a measurement of the "dissimilarity" Oij between the two objects. In multidimensional scaling (MDS) the aim is to place n points in a low dimensional space (usually Euclidean) so that the interpoint distances dij have a particular relationship to the original dissimilarities. In classical scaling we would like the interpoint distances to be equal to the dissimilarities. For example, classical scaling can be used to reconstruct a map of the locations of some cities given the distances between them.

artificial intelligence, dissimilarity, machine learning, (16 more...)

Country: North America > United States (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Neural Information Processing SystemsDec-31-2000

A MCMC Approach to Hierarchical Mixture Modelling

Williams, Christopher K. I.

There are many hierarchical clustering algorithms available, but these lack a firm statistical basis. Here we set up a hierarchical probabilistic mixture model, where data is generated in a hierarchical tree-structured manner. Markov chain Monte Carlo (MCMC) methods are demonstrated which can be used to sample from the posterior distribution over trees containing variable numbers of hidden units.

artificial intelligence, configuration, machine learning, (18 more...)

Country:

Europe > United Kingdom (0.14)
Oceania > Australia (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Neural Information Processing SystemsDec-31-2000

A MCMC Approach to Hierarchical Mixture Modelling

Williams, Christopher K. I.

artificial intelligence, configuration, machine learning, (18 more...)

Country:

Europe > United Kingdom (0.14)
Oceania > Australia (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Neural Information Processing SystemsDec-31-1999

Discovering Hidden Features with Gaussian Processes Regression

Vivarelli, Francesco, Williams, Christopher K. I.

W is often taken to be diagonal, but if we allow W to be a general positive definite matrix which can be tuned on the basis of training data, then an eigen-analysis of W shows that we are effectively creating hidden features, where the dimensionality of the hidden-feature space is determined by the data. We demonstrate the superiority of predictions using the general matrix over those based on a diagonal matrix on two test problems.

artificial intelligence, machine learning, matrix, (16 more...)

Country:

Europe > United Kingdom (0.28)
North America > Canada > Ontario > Toronto (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.47)

Neural Information Processing SystemsDec-31-1999

Finite-Dimensional Approximation of Gaussian Processes

Ferrari-Trecate, Giancarlo, Williams, Christopher K. I., Opper, Manfred

Gaussian process (GP) prediction suffers from O(n3) scaling with the data set size n. By using a finite-dimensional basis to approximate the GP predictor, the computational complexity can be reduced. We derive optimal finite-dimensional predictors under a number of assumptions, and show the superiority of these predictors over the Projected Bayes Regression method (which is asymptotically optimal). We also show how to calculate the minimal model size for a given n. The calculations are backed up by numerical experiments.

artificial intelligence, gaussian process, machine learning, (16 more...)

Country: Europe > United Kingdom (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Neural Information Processing SystemsDec-31-1999

DTs: Dynamic Trees

Williams, Christopher K. I., Adams, Nicholas J.

A dynamic tree model specifies a prior over a large number of trees, each one of which is a tree-structured belief net (TSBN). Our aim is to retain the advantages of tree-structured belief networks, namely the hierarchical structure of the model and (in part) the efficient inference algorithms, while avoiding the "blocky" artifacts that derive from a single, fixed TSBN structure. One use for DTs is as prior models over labellings for image segmentation problems. Section 2 of the paper gives the theory of DTs, and experiments are described in section 3. 2 Theory There are two essential components that make up a dynamic tree network (i) the tree architecture and (ii) the nodes and conditional probability tables (CPTs) in the given tree. We consider the architecture question first.

artificial intelligence, machine learning, node, (18 more...)