AITopics

This allows for non-Gaussian processes and non-Gaussian noise. The learning algorithm chooses a nonlinear transformation such that transformed data is well-modelled by a GP. This can be seen as including a preprocessing transformation as an integral part of the probabilistic modelling problem, rather than as an ad-hoc step. We demonstrate on several real regression problems that learning the transformation can lead to significantly better performance than using a regular GP, or a GP with a fixed transformation.

gaussian process, transformation, warped gp, (14 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Modeling & Simulation (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

Lawrence, Neil D.

In this paper we introduce a new underlying probabilistic model for principal component analysis (PCA). Our formulation interprets PCA as a particular Gaussian process prior on a mapping from a latent space to the observed data-space. We show that if the prior's covariance function constrains the mappings to be linear the model is equivalent to PCA, we then extend the model by considering less restrictive covariance functions which allow nonlinear mappings. This more general Gaussian process latent variable model (GPLVM) is then evaluated as an approach to the visualisation of high dimensional data for three different data-sets. Additionally our nonlinear algorithm can be further kernelised leading to'twin kernel PCA' in which a mapping between feature spaces occurs.

component analysis, latent space, visualisation, (11 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Europe > United Kingdom (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.72)

Zhou, Dengyong, Bousquet, Olivier, Lal, Thomas N., Weston, Jason, Schölkopf, Bernhard

Learning with Local and Global Consistency

We consider the general problem of learning from labeled and unlabeled data, which is often called semi-supervised learning or transductive inference. A principled approach to semi-supervised learning is to design a classifying function which is sufficiently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points. We present a simple algorithm to obtain such a smooth solution. Our method yields encouraging experimental results on a number of classification problems and demonstrates effective use of unlabeled data.

algorithm, consistency, rbf kernel, (16 more...)

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.98)

Verbeek, Jakob J., Roweis, Sam T., Vlassis, Nikos

Non-linear CCA and PCA by Alignment of Local Models

We propose a nonlinear Canonical Correlation Analysis (CCA) method which works by coordinating or aligning mixtures of linear models. In the same way that CCA extends the idea of PCA, our work extends recent methods for nonlinear dimensionality reduction to the case where multiple embeddings of the same underlying low dimensional coordinates are observed, each lying on a different high dimensional manifold. We also show that a special case of our method, when applied to only a single manifold, reduces to the Laplacian Eigenmaps algorithm. As with previous alignment schemes, once the mixture models have been estimated, all of the parameters of our model can be estimated in closed form without local optima in the learning. Experimental results illustrate the viability of the approach as a nonlinear extension of CCA.

latent space, manifold, projection, (14 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Germany (0.04)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Sparse Representation and Its Applications in Blind Source Separation

Li, Yuanqing, Amari, Shun-ichi, Shishkin, Sergei, Cao, Jianting, Gu, Fanji, Cichocki, Andrzej S.

In this paper, sparse representation (factorization) of a data matrix is first discussed. An overcomplete basis matrix is estimated by using the K means method.

matrix, sparse representation, vector, (15 more...)

Country:

Asia > Japan > Honshū > Kantō > Saitama Prefecture > Saitama (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Carreras, Xavier, Màrquez, Lluís

Online Learning via Global Feedback for Phrase Recognition

This work presents an architecture based on perceptrons to recognize phrase structures, and an online learning algorithm to train the perceptrons together and dependently. The recognition strategy applies learning in two layers: a filtering layer, which reduces the search space by identifying plausible phrase candidates, and a ranking layer, which recursively builds the optimal phrase structure. We provide a recognition-based feedback rule which reflects to each local function its committed errors from a global point of view, and allows to train them together online as perceptrons. Experimentation on a syntactic parsing problem, the recognition of clause hierarchies, improves state-of-the-art results and evinces the advantages of our global training method over optimizing each function locally and independently.

classifier, phrase candidate, score function, (15 more...)

Country:

Europe > Spain > Catalonia (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Industry: Education > Educational Setting > Online (0.61)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.95)

Bottou, Léon, Cun, Yann L.

Large Scale Online Learning

We consider situations where training data is abundant and computing resources are comparatively scarce. We argue that suitably designed online learning algorithms asymptotically outperform any batch learning algorithm. Both theoretical and experimental evidences are presented.

algorithm, convergence, iteration, (14 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New Jersey > Mercer County > Princeton (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Industry: Education > Educational Setting > Online (0.63)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Barber, David, Agakov, Felix V.

Information Maximization in Noisy Channels : A Variational Approach

The maximisation of information transmission over noisy channels is a common, albeit generally computationally difficult problem. We approach the difficulty of computing the mutual information for noisy channels by using a variational approximation. The resulting IM algorithm is analagous to the EM algorithm, yet maximises mutual information, as opposed to likelihood. We apply the method to several practical examples, including linear compression, population encoding and CDMA.

algorithm, approximation, decoder, (15 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Zhou, Dengyong, Weston, Jason, Gretton, Arthur, Bousquet, Olivier, Schölkopf, Bernhard

Ranking on Data Manifolds

The Google search engine has enjoyed huge success with its web page ranking algorithm, which exploits global, rather than local, hyperlink structure of the web using random walks. Here we propose a simple universal ranking algorithm for data lying in the Euclidean space, such as text or image data. The core idea of our method is to rank the data with respect to the intrinsic manifold structure collectively revealed by a great amount of data. Encouraging experimental results from synthetic, image, and text data illustrate the validity of our method.

algorithm, query, ranking score, (15 more...)

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.05)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Welling, Max, Williams, Christopher, Agakov, Felix V.

Extreme Components Analysis

Principal components analysis (PCA) is one of the most widely used techniques in machine learning and data mining. Minor components analysis (MCA) is less well known, but can also play an important role in the presence of constraints on the data distribution. In this paper we present a probabilistic model for "extreme components analysis" (XCA) which at the maximum likelihood solution extracts an optimal combination of principal and minor components. For a given number of components, the log-likelihood of the XCA model is guaranteed to be larger or equal than that of the probabilistic models for PCA and MCA. We describe an efficient algorithm to solve for the globally optimal solution. For log-convex spectra we prove that the solution consists of principal components only, while for log-concave spectra the solution consists of minor components. In general, the solution admits a combination of both. In experiments we explore the properties of XCA on some synthetic and real-world datasets.

eigenvalue, minor component, spectrum, (15 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.90)