Goto

Collaborating Authors

 Europe


Discovering Hidden Features with Gaussian Processes Regression

Neural Information Processing Systems

W is often taken to be diagonal, but if we allow W to be a general positive definite matrix which can be tuned on the basis of training data, then an eigen-analysis of W shows that we are effectively creating hidden features, where the dimensionality of the hidden-feature space is determined by the data. We demonstrate the superiority of predictions using the general matrix over those based on a diagonal matrix on two test problems.


SMEM Algorithm for Mixture Models

Neural Information Processing Systems

We present a split and merge EM (SMEM) algorithm to overcome the local maximum problem in parameter estimation of finite mixture models. In the case of mixture models, non-global maxima often involve having too many components of a mixture model in one part of the space and too few in another, widely separated part of the space. To escape from such configurations we repeatedly perform simultaneous split and merge operations using a new criterion for efficiently selecting the split and merge candidates. We apply the proposed algorithm to the training of Gaussian mixtures and mixtures of factor analyzers using synthetic and real data and show the effectiveness of using the split and merge operations to improve the likelihood of both the training data and of held-out test data. 1 INTRODUCTION Mixture density models, in particular normal mixtures, have been extensively used in the field of statistical pattern recognition [1]. Recently, more sophisticated mixture density models such as mixtures of latent variable models (e.g., probabilistic PCA or factor analysis) have been proposed to approximate the underlying data manifold [2]-[4].


Probabilistic Visualisation of High-Dimensional Binary Data

Neural Information Processing Systems

We present a probabilistic latent-variable framework for data visualisation, a key feature of which is its applicability to binary and categorical data types for which few established methods exist. A variational approximation to the likelihood is exploited to derive a fast algorithm for determining the model parameters. Illustrations of application to real and synthetic binary data sets are given.


Regularizing AdaBoost

Neural Information Processing Systems

We will also introduce a regularization strategy (analogous to weight decay) into boosting. This strategy uses slack variables to achieve a soft margin (section 4). Numerical experiments show the validity of our regularization approach in section 5 and finally a brief conclusion is given. 2 AdaBoost Algorithm Let {ht(x): t 1,...,T} be an ensemble of T hypotheses defined on input vector x and e


Replicator Equations, Maximal Cliques, and Graph Isomorphism

Neural Information Processing Systems

We present a new energy-minimization framework for the graph isomorphism problem which is based on an equivalent maximum clique formulation. The approach is centered around a fundamental result proved by Motzkin and Straus in the mid-1960s, and recently expanded in various ways, which allows us to formulate the maximum clique problem in terms of a standard quadratic program. To solve the program we use "replicator" equations, a class of simple continuous-and discrete-time dynamical systems developed in various branches of theoretical biology. We show how, despite their inability to escape from local solutions, they nevertheless provide experimental results which are competitive with those obtained using more elaborate mean-field annealing heuristics. 1 INTRODUCTION The graph isomorphism problem is one of those few combinatorial optimization problems which still resist any computational complexity characterization [6]. Despite decades of active research, no polynomial-time algorithm for it has yet been found.


Kernel PCA and De-Noising in Feature Spaces

Neural Information Processing Systems

Kernel PCA as a nonlinear feature extractor has proven powerful as a preprocessing step for classification algorithms. But it can also be considered as a natural generalization of linear principal component analysis. This gives rise to the question how to use nonlinear features for data compression, reconstruction, and de-noising, applications common in linear PCA. This is a nontrivial task, as the results provided by kernel PCA live in some high dimensional feature space and need not have pre-images in input space. This work presents ideas for finding approximate pre-images, focusing on Gaussian kernels, and shows experimental results using these pre-images in data reconstruction and de-noising on toy examples as well as on real world data.


Exploratory Data Analysis Using Radial Basis Function Latent Variable Models

Neural Information Processing Systems

Two developments of nonlinear latent variable models based on radial basis functions are discussed: in the first, the use of priors or constraints on allowable models is considered as a means of preserving data structure in low-dimensional representations for visualisation purposes. Also, a resampling approach is introduced which makes more effective use of the latent samples in evaluating the likelihood.


Neural Networks for Density Estimation

Neural Information Processing Systems

Even if the underlying phenomena are inherently deterministic, the complexity of these phenomena often makes a probabilistic formulation the only feasible approach from the computational point of view. Although quantities such as the mean, the variance, and possibly higher order moments of a random variable have often been sufficient to characterize a particular problem, the quest for higher modeling accuracy, and for more realistic assumptions drives us towards modeling the available random variables using their probability density. This of course leads us to the problem of density estimation (see [6]). The most common approach for density estimation is the nonparametric approach, where the density is determined according to a formula involving the data points available. The most common non parametric methods are the kernel density estimator, also known as the Parzen window estimator [4] and the k-nearest neighbor technique [1].


A Polygonal Line Algorithm for Constructing Principal Curves

Neural Information Processing Systems

Principal curves have been defined as "self consistent" smooth curves which pass through the "middle" of a d-dimensional probability distribution or data cloud. Recently, we [1] have offered a new approach by defining principal curves as continuous curves of a given length which minimize the expected squared distance between the curve and points of the space randomly chosen according to a given distribution. The new definition made it possible to carry out a theoretical analysis of learning principal curves from training data. In this paper we propose a practical construction based on the new definition. Simulation results demonstrate that the new algorithm compares favorably with previous methods both in terms of performance and computational complexity.


Learning from Dyadic Data

Neural Information Processing Systems

Dyadzc data refers to a domain with two finite sets of objects in which observations are made for dyads, i.e., pairs with one element from either set. This type of data arises naturally in many application ranging from computational linguistics and information retrieval to preference analysis and computer vision. In this paper, we present a systematic, domain-independent framework of learning from dyadic data by statistical mixture models. Our approach covers different models with fiat and hierarchical latent class structures. We propose an annealed version of the standard EM algorithm for model fitting which is empirically evaluated on a variety of data sets from different domains. 1 Introduction Over the past decade learning from data has become a highly active field of research distributed over many disciplines like pattern recognition, neural computation, statistics, machine learning, and data mining.