AITopics

Dimension reduction provides compact representations for storage, transmission, and classification. Dimension reduction algorithms operate by identifying and eliminating statistical redundancies in the data. The optimal linear technique for dimension reduction is principal component analysis (PCA).

algorithm, dimension reduction, reduction, (13 more...)

Country:

North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Österberg, Mats, Lenz, Reiner

Unsupervised Parallel Feature Extraction from First Principles

We describe a number of learning rules that can be used to train unsupervised parallel feature extraction systems. The learning rules are derived using gradient ascent of a quality function. We consider a number of quality functions that are rational functions of higher order moments of the extracted feature values. We show that one system learns the principle components of the correlation matrix. Principal component analysis systems are usually not optimal feature extractors for classification.

eigenvector, filter function, quality function, (12 more...)

Country:

Europe > Sweden > Östergötland County > Linköping (0.06)
Europe > Finland > Uusimaa > Helsinki (0.04)
Europe > Finland > South Karelia > Lappeenranta (0.04)

Technology:

Information Technology > Data Science > Data Mining > Feature Extraction (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.52)

Ghahramani, Zoubin, Jordan, Michael I.

Supervised learning from incomplete data via an EM approach

Real-world learning tasks may involve high-dimensional data sets with arbitrary patterns of missing data. In this paper we present a framework based on maximum likelihood density estimation for learning from such data set.s. VVe use mixture models for the density estimates and make two distinct appeals to the Expectation Maximization (EM) principle (Dempster et al., 1977) in deriving a learning algorithm-EM is used both for the estimation of mixture components and for coping wit.h missing dat.a. The resulting algorithm is applicable t.o a wide range of supervised as well as unsupervised learning problems.

algorithm, gaussian, incomplete data, (13 more...)

Country:

Asia > Middle East > Jordan (0.16)
North America > United States > New York (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.90)

Learning Classification with Unlabeled Data

Sa, Virginia R. de

We represent objects with n-dimensional pattern vectors and consider piecewise-linear classifiers consisting of a collection of (labeled) codebook vectors in the space of the input patterns (See Figure 1). The classification boundaries are gi ven by the voronoi tessellation of the codebook vectors. Patterns are said to belong to the class (given by the label) of the codebook vector to which they are closest.

codebook vector, modality, vector, (17 more...)

Country:

North America > United States > Virginia (0.05)
North America > United States > New York > Monroe County > Rochester (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.36)

Buhmann, Joachim, Hofmann, Thomas

Central and Pairwise Data Clustering by Competitive Neural Networks

Data clustering amounts to a combinatorial optimization problem to reduce the complexity of a data representation and to increase its precision. Central and pairwise data clustering are studied in the maximum entropy framework. For central clustering we derive a set of reestimation equations and a minimization procedure which yields an optimal number of clusters, their centers and their cluster probabilities. A meanfield approximation for pairwise clustering is used to estimate assignment probabilities. A se1fconsistent solution to multidimensional scaling and pairwise clustering is derived which yields an optimal embedding and clustering of data points in a d-dimensional Euclidian space. 1 Introduction A central problem in information processing is the reduction of the data complexity with minimal loss in precision to discard noise and to reveal basic structure of data sets. Data clustering addresses this tradeoff by optimizing a cost function which preserves the original data as complete as possible and which simultaneously favors prototypes with minimal complexity (Linde et aI., 1980; Gray, 1984; Chou et aI., 1989; Rose et ai., 1990). We discuss an objective function for the joint optimization of distortion errors and the complexity of a reduced data representation. A maximum entropy estimation of the cluster assignments yields a unifying framework for clustering algorithms with a number of different distortion and complexity measures. The close analogy of complexity optimized clustering with winner-take-all neural networks suggests a neural-like implementation resembling topological feature maps (see Figure 1).

central and pairwise data clustering, complexity, free energy, (11 more...)

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > North Rhine-Westphalia (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Bengio, Yoshua, Frasconi, Paolo

Credit Assignment through Time: Alternatives to Backpropagation

Learning to recognize or predict sequences using long-term context has many applications. However, practical and theoretical problems are found in training recurrent neural networks to perform tasks in which input/output dependencies span long intervals. Starting from a mathematical analysis of the problem, we consider and compare alternative algorithms and architectures on tasks for which the span of the input/output dependencies can be controlled. Results on the new algorithms show performance qualitatively superior to that obtained with backpropagation. 1 Introduction Recurrent neural networks have been considered to learn to map input sequences to output sequences. Machines that could efficiently learn such tasks would be useful for many applications involving sequence prediction, recognition or production. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals. In fact, we can prove that dynamical systems such as recurrent neural networks will be increasingly difficult to train with gradient descent as the duration of the dependencies to be captured increases. A mathematical analysis of the problem shows that either one of two conditions arises in such systems.

algorithm, information, sequence, (13 more...)

Country:

North America > Canada > Quebec > Montreal (0.05)
Asia > Middle East > Jordan (0.05)
Europe > Italy (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.52)

Maron, Oded, Moore, Andrew W.

Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation

Such decisions include which function approximator to use, how to trade smoothness for goodness of fit and which features are relevant.

algorithm, hoeffding race, iteration, (12 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.40)

Bregler, Christoph, Omohundro, Stephen M.

Surface Learning with Applications to Lipreading

Most connectionist research has focused on learning mappings from one space to another (eg.

dimension, query, surface learning, (15 more...)

Country:

North America > United States > California > Alameda County > Berkeley (0.05)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
North America > United States > Illinois (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Levin, Asriel U., Leen, Todd K., Moody, John E.

Fast Pruning Using Principal Components

The assumption is that there exists an underlying (possibly noisy) functional relationship relating the outputs to the inputs y /(u,e) where e denotes the noise. The aim of the learning process is to approximate this relationship based on the the training set.

algorithm, correlation matrix, hessian, (14 more...)

Country:

North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Dietterich, Thomas G., Wettschereck, Dietrich, Atkeson, Chris G., Moore, Andrew W.

Memory-Based Methods for Regression and Classification

Memory-based learning methods operate by storing all (or most) of the training data and deferring analysis of that data until "run time" (i.e., when a query is presented and a decision or prediction must be made). When a query is received, these methods generally answer the query by retrieving and analyzing a small subset of the training data-namely, data in the immediate neighborhood of the query point. In short, memory-based methods are "lazy" (they wait until the query) and "local" (they use only a local neighborhood). The purpose of this workshop was to review the state-of-the-art in memory-based methods and to understand their relationship to "eager" and "global" learning algorithms such as batch backpropagation. There are two essential components to any memory-based algorithm: the method for defining the "local neighborhood" and the learning method that is applied to the training examples in the local neighborhood.

algorithm, kernel, training example, (14 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
North America > United States > Oregon > Benton County > Corvallis (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.66)