Goto

Collaborating Authors

 Statistical Learning


Going Metric: Denoising Pairwise Data

Neural Information Processing Systems

Pairwise data in empirical sciences typically violate metricity, either dueto noise or due to fallible estimates, and therefore are hard to analyze by conventional machine learning technology. In this paper we therefore study ways to work around this problem. First, we present an alternative embedding to multidimensional scaling (MDS) that allows us to apply a variety of classical machine learningand signal processing algorithms. The class of pairwise grouping algorithms which share the shift-invariance property is statistically invariant under this embedding procedure, leading to identical assignments of objects to clusters. Based on this new vectorial representation, denoising methods are applied in a second step.Both steps provide a theoretically well controlled setup to translate from pairwise data to the respective denoised metric representation.We demonstrate the practical usefulness of our theoretical reasoning by discovering structure in protein sequence data bases, visibly improving performance upon existing automatic methods. 1 Introduction Unsupervised grouping or clustering aims at extracting hidden structure from data (see e.g.


Constraint Classification for Multiclass Classification and Ranking

Neural Information Processing Systems

We present a meta-algorithm for learning in this framework that learns via a single linear classifier in high dimension. We discuss distribution independent as well as margin-based generalization bounds and present empirical and theoretical evidence showing that constraint classification benefits over existing methods of multiclass classification.


A Formulation for Minimax Probability Machine Regression

Neural Information Processing Systems

We formulate the regression problem as one of maximizing the minimum probability,symbolized by Ω, that future predicted outputs of the regression model will be within some ε bound of the true regression function. Our formulation is unique in that we obtain a direct estimate of this lower probability bound Ω. The proposed framework, minimax probability machine regression (MPMR), is based on the recently described minimaxprobability machine classification algorithm [Lanckriet et al.] and uses Mercer Kernels to obtain nonlinear regression models. MPMR is tested on both toy and real world data, verifying the accuracy of the Ω bound, and the efficacy of the regression models.


One-Class LP Classifiers for Dissimilarity Representations

Neural Information Processing Systems

Problems in which abnormal or novel situations should be detected can be approached by describing the domain of the class of typical examples. Theseapplications come from the areas of machine diagnostics, fault detection, illness identification or, in principle, refer to any problem wherelittle knowledge is available outside the typical class. In this paper we explain why proximities are natural representations for domain descriptors and we propose a simple one-class classifier for dissimilarity representations. By the use of linear programming an efficient one-class description can be found, based on a small number of prototype objects. This classifier can be made (1) more robust by transforming the dissimilarities and(2) cheaper to compute by using a reduced representation set. Finally, a comparison to a comparable one-class classifier by Campbell and Bennett is given.


Adaptive Classification by Variational Kalman Filtering

Neural Information Processing Systems

We propose in this paper a probabilistic approach for adaptive inference of generalized nonlinear classification that combines the computational advantage of a parametric solution with the flexibility of sequential sampling techniques.We regard the parameters of the classifier as latent states in a first order Markov process and propose an algorithm which can be regarded as variational generalization of standard Kalman filtering. Thevariational Kalman filter is based on two novel lower bounds that enable us to use a non-degenerate distribution over the adaptation rate. An extensive empirical evaluation demonstrates that the proposed method is capable of infering competitive classifiers both in stationary and non-stationary environments. Although we focus on classification, the algorithm is easily extended to other generalized nonlinear models.


Parametric Mixture Models for Multi-Labeled Text

Neural Information Processing Systems

We propose probabilistic generative models, called parametric mixture models(PMMs), for multiclass, multi-labeled text categorization problem.Conventionally, the binary classification approach has been employed, in which whether or not text belongs to a category isjudged by the binary classifier for every category. In contrast, our approach can simultaneously detect multiple categories of text using PMMs. We derive efficient learning and prediction algorithms forPMMs. We also empirically show that our method could significantly outperform the conventional binary methods when applied tomulti-labeled text categorization using real World Wide Web pages.


Global Versus Local Methods in Nonlinear Dimensionality Reduction

Neural Information Processing Systems

Recently proposed algorithms for nonlinear dimensionality reduction fall broadly into two categories which have different advantages and disadvantages: global(Isomap [1]), and local (Locally Linear Embedding [2], Laplacian Eigenmaps [3]). We present two variants of Isomap which combine the advantages of the global approach with what have previously beenexclusive advantages of local methods: computational sparsity and the ability to invert conformal maps.


Half-Lives of EigenFlows for Spectral Clustering

Neural Information Processing Systems

Using a Markov chain perspective of spectral clustering we present an algorithm to automatically find the number of stable clusters in a dataset. The Markov chain's behaviour is characterized by the spectral properties of the matrix of transition probabilities, from which we derive eigenflows along with their halflives. An eigenflow describes the flow of probability massdue to the Markov chain, and it is characterized by its eigenvalue, orequivalently, by the halflife of its decay as the Markov chain is iterated. A ideal stable cluster is one with zero eigenflow and infinite half-life.The key insight in this paper is that bottlenecks between weakly coupled clusters can be identified by computing the sensitivity of the eigenflow's halflife to variations in the edge weights. We propose a novel EIGENCUTS algorithm to perform clustering that removes these identified bottlenecks in an iterative fashion.


Self Supervised Boosting

Neural Information Processing Systems

Boosting algorithms and successful applications thereof abound for classification andregression learning problems, but not for unsupervised learning. We propose a sequential approach to adding features to a random fieldmodel by training them to improve classification performance between the data and an equal-sized sample of "negative examples" generated fromthe model's current estimate of the data density.


Independent Components Analysis through Product Density Estimation

Neural Information Processing Systems

We present a simple direct approach for solving the ICA problem, using density estimation and maximum likelihood. Given a candidate orthogonalframe, we model each of the coordinates using a semi-parametric density estimate based on cubic splines. Since our estimates have two continuous derivatives, we can easily run a second ordersearch for the frame parameters. Our method performs very favorably when compared to state-of-the-art techniques. 1 Introduction Independent component analysis (ICA) is a popular enhancement over principal component analysis (PCA) and factor analysis. IRP which is assumed to arise from a linear mixing of a latent random source vector S E IRP, (1) X AS; the components Sj, j 1, ...,p of S are assumed to be independently distributed.