AITopics

We consider the general problem of utilizing both labeled and unlabeled data to improve classification accuracy. Under t he assumption that the data lie on a submanifold in a high dimensional space, we develop an algorithmic framework to classify a partially labeled data set in a principled manner. The central idea of our approach is that classification functions are naturally defined only on t he submanifold in question rather than the total ambient space. Using the Laplace Beltrami operator one produces a basis for a Hilbert space of square integrable functions on the submanifold. To recover such a basis, only unlabeled examples are required. Once a basis is obtained, training can be performed using the labeled data set. Our algorithm models the manifold using the adjacency graph for the data and approximates the Laplace Beltrami operator by the graph Laplacian. Practical applications to image and text classification are considered.

classifier, eigenfunction, manifold, (16 more...)

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Sokolova, Marina, Marchand, Mario, Japkowicz, Nathalie, Shawe-taylor, John S.

The Decision List Machine

We introduce a new learning algorithm for decision lists to allow features that are constructed from the data and to allow a tradeoff between accuracy and complexity. We bound its generalization error in terms of the number of errors and the size of the classifier it finds on the training data. We also compare its performance on some natural data sets with the set covering machine and the support vector machine.

algorithm, compression, dlm, (15 more...)

Country:

North America > Canada > Ontario > National Capital Region > Ottawa (0.05)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
Europe > United Kingdom (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)

Hughes, Nicholas P., Lowe, David

Artefactual Structure from Least-Squares Multidimensional Scaling

We consider the problem of illusory or artefactual structure from the visualisation of high-dimensional structureless data. In particular we examine the role of the distance metric in the use of topographic mappings based on the statistical field of multidimensional scaling. We show that the use of a squared Euclidean metric (i.e. the SS

configuration, data space, mapping, (15 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Asia > Middle East > Jordan (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Italy > Veneto > Venice (0.04)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Hochreiter, Sepp, Obermayer, Klaus

Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers

We investigate the problem of learning a classification task for datasets which are described by matrices. Rows and columns of these matrices correspond to objects, where row and column objects may belong to different sets, and the entries in the matrix express the relationships between them. We interpret the matrix elements as being produced by an unknown kernel which operates on object pairs and we show that - under mild assumptions - these kernels correspond to dot products in some (unknown) feature space. Minimizing a bound for the generalization error of a linear classifier which has been obtained using covering numbers we derive an objective function for model selection according to the principle of structural risk minimization. The new objective function has the advantage that it allows the analysis of matrices which are not positive definite, and not even symmetric or square.

classifier, matrix, selection, (14 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Germany > Berlin (0.04)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.49)
Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.30)

Weston, Jason, Chapelle, Olivier, Vapnik, Vladimir, Elisseeff, André, Schölkopf, Bernhard

Kernel Dependency Estimation

We consider the learning problem of finding a dependency between a general class of objects and another, possibly different, general class of objects. The objects can be for example: vectors, images, strings, trees or graphs. Such a task is made possible by employing similarity measures in both input and output spaces using kernel functions, thus embedding the objects into vector spaces. We experimentally validate our approach on several tasks: mapping strings to strings, pattern recognition, and reconstruction from partial images. 1 Introduction In this article we consider the rather general learning problem of finding a dependency between inputs x E X and outputs y E Y given a training set (Xl,yl),...,(xm, Ym) E X x Y This includes conventional pattern recognition and regression estimation. It also encompasses more complex dependency estimation tasks, e.g mapping of a certain class of strings to a certain class of graphs (as in text parsing) or the mapping of text descriptions to images.

algorithm, kernel, loss function, (14 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > New York (0.04)
(2 more...)

Industry: Education > Focused Education > Special Education (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Informed Projections

Cohn, David

Low rank approximation techniques are widespread in pattern recognition research -- they include Latent Semantic Analysis (LSA), Probabilistic LSA, Principal Components Analysus (PCA), the Generative Aspect Model, and many forms of bibliometric analysis. All make use of a low-dimensional manifold onto which data are projected. Such techniques are generally "unsupervised," which allows them to model data in the absence of labels or categories. With many practical problems, however, some prior knowledge is available in the form of context. In this paper, I describe a principled approach to incorporating such information, and demonstrate its application to PCA-based approximations of several data sets.

experiment, information, projection, (17 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Teh, Yee W., Roweis, Sam T.

Automatic Alignment of Local Representations

We present an automatic alignment procedure which maps the disparate internal representations learned by several local dimensionality reduction experts into a single, coherent global coordinate system for the original data space. Our algorithm can be applied to any set of experts, each of which produces a low-dimensional local representation of a highdimensional input. Unlike recent efforts to coordinate such models by modifying their objective functions [1, 2], our algorithm is invoked after training and applies an efficient eigensolver to post-process the trained models. The post-processing has no local optima and the size of the system it must solve scales with the number of local models rather than the number of original data points, making it more efficient than model-free algorithms such as Isomap [3] or LLE [4].

factor analyzer, global coordinate, representation, (14 more...)

Country: North America > Canada > Ontario > Toronto (0.15)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Vincent, Pascal, Bengio, Yoshua

Manifold Parzen Windows

The similarity between objects is a fundamental element of many learning algorithms. Most nonparametric methods take this similarity to be fixed, but much recent work has shown the advantages of learning it, in particular to exploit the local invariances in the data or to capture the possibly nonlinear manifold on which most of the data lies. We propose a new nonparametric kernel density estimation method which captures the local structure of an underlying manifold through the leading eigenvectors of regularized local covariance matrices.

covariance matrix, gaussian, manifold, (13 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > Canada > Quebec > Montreal (0.05)
Europe > Germany > Schleswig-Holstein > Kiel (0.05)
North America > Canada > Ontario > Toronto (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Roth, Volker, Laub, Julian, Müller, Klaus-Robert, Buhmann, Joachim M.

Going Metric: Denoising Pairwise Data

Pairwise data in empirical sciences typically violate metricity, either due to noise or due to fallible estimates, and therefore are hard to analyze by conventional machine learning technology. In this paper we therefore study ways to work around this problem. First, we present an alternative embedding to multidimensional scaling (MDS) that allows us to apply a variety of classical machine learning and signal processing algorithms. The class of pairwise grouping algorithms which share the shift-invariance property is statistically invariant under this embedding procedure, leading to identical assignments of objects to clusters. Based on this new vectorial representation, denoising methods are applied in a second step. Both steps provide a theoretically well controlled setup to translate from pairwise data to the respective denoised metric representation. We demonstrate the practical usefulness of our theoretical reasoning by discovering structure in protein sequence data bases, visibly improving performance upon existing automatic methods. 1 Introduction Unsupervised grouping or clustering aims at extracting hidden structure from data (see e.g.

matrix, pairwise data, sequence, (14 more...)

Country:

Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
Europe > Germany > Brandenburg > Potsdam (0.04)
Europe > Germany > Berlin (0.04)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Strohmann, Thomas, Grudic, Gregory Z.

A Formulation for Minimax Probability Machine Regression

We formulate the regression problem as one of maximizing the minimum probability, symbolized by Ω, that future predicted outputs of the regression model will be within some ε bound of the true regression function. Our formulation is unique in that we obtain a direct estimate of this lower probability bound Ω. The proposed framework, minimax probability machine regression (MPMR), is based on the recently described minimax probability machine classification algorithm [Lanckriet et al.] and uses Mercer Kernels to obtain nonlinear regression models. MPMR is tested on both toy and real world data, verifying the accuracy of the Ω bound, and the efficacy of the regression models.

probability, regression function, regression model, (13 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Colorado > Boulder County > Boulder (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)