AITopics

When combining a set of learned models to form an improved estimator, the issue of redundancy or multicollinearity in the set of models must be addressed. A progression of existing approaches and their limitations with respect to the redundancy is discussed. A new approach, PCR *, based on principal components regression is proposed to address these limitations. An evaluation of the new approach on a collection of domains reveals that: 1) PCR* was the most robust combination method as the redundancy of the learned models increased, 2) redundancy could be handled without eliminating any of the learned models, and 3) the principal components of the learned models provided a continuum of "regularized" weights from which PCR * could choose.

principal component, redundancy, regression, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Orange County > Irvine (0.14)
North America > Canada > Ontario > Toronto (0.14)
(2 more...)

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.44)

Lee, Daniel D., Seung, H. Sebastian

Unsupervised Learning by Convex and Conic Coding

Unsupervised learning algorithms based on convex and conic encoders are proposed. The encoders find the closest convex or conic combination of basis vectors to the input. The learning algorithms produce basis vectors that minimize the reconstruction error of the encoders. The convex algorithm develops locally linear models of the input, while the conic algorithm discovers features. Both algorithms are used to model handwritten digits and compared with vector quantization and principal component analysis.

algorithm, basis vector, encoder, (13 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Combinations of Weak Classifiers

Ji, Chuanyi, Ma, Sheng

To obtain classification systems with both good generalization performance and efficiency in space and time, we propose a learning method based on combinations of weak classifiers, where weak classifiers are linear classifiers (perceptrons) which can do a little better than making random guesses. A randomized algorithm is proposed to find the weak classifiers. They· are then combined through a majority vote. As demonstrated through systematic experiments, the method developed is able to obtain combinations of weak classifiers with good generalization performance and a fast training time on a variety of test problems and real applications.

algorithm, classifier, weak classifier, (14 more...)

Country:

North America > United States > New York > Rensselaer County > Troy (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.47)

Industry: Education (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.32)

Hyvärinen, Aapo, Oja, Erkki

One-unit Learning Rules for Independent Component Analysis

Neural one-unit learning rules for the problem of Independent Component Analysis (ICA) and blind source separation are introduced. In these new algorithms, every ICA neuron develops into a separator that finds one of the independent components. The learning rules use very simple constrained Hebbianjanti-Hebbian learning in which decorrelating feedback may be added. To speed up the convergence of these stochastic gradient descent rules, a novel computationally efficient fixed-point algorithm is introduced. 1 Introduction Independent Component Analysis (ICA) (Comon, 1994; Jutten and Herault, 1991) is a signal processing technique whose goal is to express a set of random variables as linear combinations of statistically independent component variables. The main applications of ICA are in blind source separation, feature extraction, and blind deconvolution.

algorithm, fixed-point algorithm, kurtosis, (14 more...)

Country:

Europe > Finland > Uusimaa > Helsinki (0.05)
North America > United States > Colorado > Denver County > Denver (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Fritsch, Jürgen, Finke, Michael, Waibel, Alex

Adaptively Growing Hierarchical Mixtures of Experts

We propose a novel approach to automatically growing and pruning Hierarchical Mixtures of Experts. The constructive algorithm proposed here enables large hierarchies consisting of several hundred experts to be trained effectively. We show that HME's trained by our automatic growing procedure yield better generalization performance than traditional static and balanced hierarchies. Evaluation of the algorithm is performed (1) on vowel classification and (2) within a hybrid version of the JANUS r9] speech recognition system using a subset of the Switchboard large-vocabulary speaker-independent continuous speech recognition database.

hierarchical mixture, hme, probability, (17 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Asia > Middle East > Jordan (0.05)
North America > United States > New York (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.90)

Limitations of Self-organizing Maps for Vector Quantization and Multidimensional Scaling

Flexer, Arthur

SaM can be said to do clustering/vector quantization (VQ) and at the same time to preserve the spatial ordering of the input data reflected by an ordering of the code book vectors (cluster centroids) in a one or two dimensional output space, where the latter property is closely related to multidimensional scaling (MDS) in statistics. Although the level of activity and research around the SaM algorithm is quite large (a recent overview by [Kohonen 95] contains more than 1000 citations), only little comparison among the numerous existing variants of the basic approach and also to more traditional statistical techniques of the larger frameworks of VQ and MDS is available. Additionally, there is only little advice in the literature about how to properly use 446 A. Flexer SOM in order to get optimal results in terms of either vector quantization (VQ) or multidimensional scaling or maybe even both of them. To make the notion of SOM being a tool for "data visualization" more precise, the following question has to be answered: Should SOM be used for doing VQ, MDS, both at the same time or none of them? Two recent comprehensive studies comparing SOM either to traditional VQ or MDS techniques separately seem to indicate that SOM is not competitive when used for either VQ or MDS: [Balakrishnan et al. 94J compare SOM to K-means clustering on 108 multivariate normal clustering problems with known clustering solutions and show that SOM performs significantly worse in terms of data points misclassified

algorithm, code book vector, vector, (13 more...)

Country:

Europe > Austria > Vienna (0.15)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.72)

Dunmur, A. P., Titterington, D. M.

On a Modification to the Mean Field EM Algorithm in Factorial Learning

A modification is described to the use of mean field approximations in the E step of EM algorithms for analysing data from latent structure models, as described by Ghahramani (1995), among others. The modification involves second-order Taylor approximations to expectations computed in the E step. The potential benefits of the method are illustrated using very simple latent profile models.

algorithm, approximation, latent variable, (14 more...)

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Estimating Equivalent Kernels for Neural Networks: A Data Perturbation Approach

Burgess, A. Neil

The perturbation method which we have presented overcomes the limitations of standard approaches, which are only appropriate for models with a single layer of adjustable weights, albeit at considerable computational expense. It has the added bonus of automatically taking into account the effect of regularisation techniques such as weight decay. The experimental results illustrate the application of the technique to two simple problems. As expected the number of degrees of freedom in the models is found to be related to the amount of weight decay used during training. The equivalent kernels are found to vary significantly in different regions of input space and the functions reconstructed from the estimated smoother matrices closely match the origna!

equivalent kernel, kernel, neural network, (10 more...)

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Burges, Christopher J. C., Schölkopf, Bernhard

Improving the Accuracy and Speed of Support Vector Machines

Support Vector Learning Machines (SVM) are finding application in pattern recognition, regression estimation, and operator inversion for ill-posed problems. Against this very general backdrop, any methods for improving the generalization performance, or for improving the speed in test phase, of SVMs are of increasing interest. In this paper we combine two such techniques on a pattern recognition problem. The method for improving generalization performance (the "virtual support vector" method) does so by incorporating known invariances of the problem. This method achieves a drop in the error rate on 10,000 NIST test digit images of 1.4% to 1.0%.

support vector, vapnik, vector, (14 more...)

Country:

North America > United States > New York (0.05)
North America > United States > California > San Mateo County > San Mateo (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Bradley, Paul S., Mangasarian, Olvi L., Street, W. Nick

Clustering via Concave Minimization

There are many approaches to this problem, including statistical [9], machine learning [7], integer and mathematical programming [18,1]. In this paper we concentrate on a simple concave minimization formulation of the problem that leads to a finite and fast algorithm.

algorithm, correctness, k-median algorithm, (14 more...)

Country:

North America > United States > Oklahoma > Payne County > Stillwater (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Industry: Health & Medicine (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)