Country
Artefactual Structure from Least-Squares Multidimensional Scaling
Hughes, Nicholas P., Lowe, David
We consider the problem of illusory or artefactual structure from the visualisation ofhigh-dimensional structureless data. In particular we examine the role of the distance metric in the use of topographic mappings based on the statistical field of multidimensional scaling. We show that the use of a squared Euclidean metric (i.e. the SSTRESS measure) gives rise to an annular structure when the input data is drawn from a highdimensional isotropicdistribution, and we provide a theoretical justification for this observation.
Robust Novelty Detection with Single-Class MPM
Ghaoui, Laurent E., Jordan, Michael I., Lanckriet, Gert R.
This algorithm-the "single-class minimax probability machine(MPM)"- is built on a distribution-free methodology that minimizes the worst-case probability of a data point falling outside of a convex set, given only the mean and covariance matrix of the distribution and making no further distributional assumptions. Wepresent a robust approach to estimating the mean and covariance matrix within the general two-class MPM setting, and show how this approach specializes to the single-class problem. We provide empirical results comparing the single-class MPM to the single-class SVM and a two-class SVM method. 1 Introduction Novelty detection is an important unsupervised learning problem in which test data are to be judged as having been generated from the same or a different process as that which generated the training data.
Extracting Relevant Structures with Side Information
The problem of extracting the relevant aspects of data, in face of multiple conflicting structures, is inherent to modeling of complex data. Extracting structurein one random variable that is relevant for another variable has been principally addressed recently via the information bottleneck method [15]. However, such auxiliary variables often contain more information thanis actually required due to structures that are irrelevant for the task. In many other cases it is in fact easier to specify what is irrelevant than what is, for the task at hand. Identifying the relevant structures, however, can thus be considerably improved by also minimizing theinformation about another, irrelevant, variable. In this paper we give a general formulation of this problem and derive its formal, as well as algorithmic, solution. Its operation is demonstrated in a synthetic example and in two real world problems in the context of text categorization andface images. While the original information bottleneck problem is related to rate distortion theory, with the distortion measure replaced by the relevant information, extracting relevant features while removing irrelevant ones is related to rate distortion with side information.
Automatic Alignment of Local Representations
We present an automatic alignment procedure which maps the disparate internal representations learned by several local dimensionality reduction experts into a single, coherent global coordinate system for the original data space. Our algorithm can be applied to any set of experts, each of which produces a low-dimensional local representation of a highdimensional input.Unlike recent efforts to coordinate such models by modifying their objective functions [1, 2], our algorithm is invoked after training and applies an efficient eigensolver to post-process the trained models. The post-processing has no local optima and the size of the system itmust solve scales with the number of local models rather than the number of original data points, making it more efficient than model-free algorithms such as Isomap [3] or LLE [4].
Manifold Parzen Windows
Vincent, Pascal, Bengio, Yoshua
The similarity between objects is a fundamental element of many learning algorithms.Most nonparametric methods take this similarity to be fixed, but much recent work has shown the advantages of learning it, in particular to exploit the local invariances in the data or to capture the possibly nonlinear manifold on which most of the data lies. We propose a new nonparametric kernel density estimation method which captures the local structure of an underlying manifold through the leading eigenvectors ofregularized local covariance matrices.
Going Metric: Denoising Pairwise Data
Roth, Volker, Laub, Julian, Mรผller, Klaus-Robert, Buhmann, Joachim M.
Pairwise data in empirical sciences typically violate metricity, either dueto noise or due to fallible estimates, and therefore are hard to analyze by conventional machine learning technology. In this paper we therefore study ways to work around this problem. First, we present an alternative embedding to multidimensional scaling (MDS) that allows us to apply a variety of classical machine learningand signal processing algorithms. The class of pairwise grouping algorithms which share the shift-invariance property is statistically invariant under this embedding procedure, leading to identical assignments of objects to clusters. Based on this new vectorial representation, denoising methods are applied in a second step.Both steps provide a theoretically well controlled setup to translate from pairwise data to the respective denoised metric representation.We demonstrate the practical usefulness of our theoretical reasoning by discovering structure in protein sequence data bases, visibly improving performance upon existing automatic methods. 1 Introduction Unsupervised grouping or clustering aims at extracting hidden structure from data (see e.g.
Constraint Classification for Multiclass Classification and Ranking
Har-Peled, Sariel, Roth, Dan, Zimak, Dav
We present a meta-algorithm for learning in this framework that learns via a single linear classifier in high dimension. We discuss distribution independent as well as margin-based generalization bounds and present empirical and theoretical evidence showing that constraint classification benefits over existing methods of multiclass classification.
A Differential Semantics for Jointree Algorithms
Park, James D., Darwiche, Adnan
A new approach to inference in belief networks has been recently proposed, which is based on an algebraic representation of belief networks using multi-linear functions. According to this approach, the key computational question is that of representing multi-linear functions compactly, since inference reduces to a simple process of ev aluating and differentiating such functions. W e show here that mainstream inference algorithms based on jointrees are a special case of this approach in a v ery precise sense. W e use this result to prov e new properties of jointree algorithms, and then discuss some of its practical and theoretical implications.
VIBES: A Variational Inference Engine for Bayesian Networks
Bishop, Christopher M., Spiegelhalter, David, Winn, John
In recent years variational methods have become a popular tool for approximate inference and learning in a wide variety of probabilistic models.For each new application, however, it is currently necessary first to derive the variational update equations, and then to implement them in application-specific code. Each of these steps is both time consuming and error prone. In this paper we describe a general purpose inference engine called VIBES ('Variational Inference forBayesian Networks') which allows a wide variety of probabilistic modelsto be implemented and solved variationally without recourse to coding. New models are specified either through a simple script or via a graphical interface analogous to a drawing package. VIBES then automatically generates and solves the variational equations.We illustrate the power and flexibility of VIBES using examples from Bayesian mixture modelling.