AITopics

Fishers linear discriminant analysis (LDA) is a classical multivariate technique both for dimension reduction and classification. The data vectors are transformed into a low dimensional subspace such that the class centroids are spread out as much as possible. In this subspace LDA works as a simple prototype classifier with linear decision boundaries. However, in many applications the linear boundaries do not adequately separate the classes. We present a nonlinear generalization of discriminant analysis that uses the kernel trick of representing dot products by kernel functions.

algorithm, discriminant analysis, vector, (16 more...)

Country:

North America > United States > California > Monterey County > Monterey (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.74)

Rätsch, Gunnar, Schölkopf, Bernhard, Smola, Alex J., Müller, Klaus-Robert, Onoda, Takashi, Mika, Sebastian

v-Arc: Ensemble Learning in the Presence of Outliers

The idea of a large minimum margin [17] explains the good generalization performance of AdaBoost in the low noise regime. However, AdaBoost performs worse on noisy tasks [10, 11], such as the iris and the breast cancer benchmark data sets [1]. On the latter tasks, a large margin on all training points cannot be achieved without adverse effects on the generalization error. This experimental observation was supported by the study of [13] where the generalization error of ensemble methods was bounded by the sum of the fraction of training points which have a margin smaller than some value p, say, plus a complexity term depending on the base hypotheses and p. While this bound can only capture part of what is going on in practice, it nevertheless already conveys the message that in some cases it pays to allow for some points which have a small margin, or are misclassified, if this leads to a larger overall margin on the remaining points. To cope with this problem, it was mandatory to construct regularized variants of AdaBoost, which traded off the number of margin errors and the size of the margin 562 G. Riitsch, B. Sch6lkopf, A. J. Smola, K.-R.

adaboost, algorithm, ensemble learning, (16 more...)

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Oceania > Australia > Queensland (0.04)
(6 more...)

Industry:

Health & Medicine > Therapeutic Area (0.54)
Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

The Infinite Gaussian Mixture Model

Rasmussen, Carl Edward

In a Bayesian mixture model it is not necessary a priori to limit the number of components to be finite. In this paper an infinite Gaussian mixture model is presented which neatly sidesteps the difficult problem of finding the "right" number of mixture components. Inference in the model is done using an efficient parameter-free Markov Chain that relies entirely on Gibbs sampling.

indicator, mixture model, unrepresented class, (14 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Denmark > Capital Region > Kongens Lyngby (0.14)
North America > United States > New York (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)

Platt, John C., Cristianini, Nello, Shawe-Taylor, John

Large Margin DAGs for Multiclass Classification

We present a new learning architecture: the Decision Directed Acyclic Graph (DDAG), which is used to combine many two-class classifiers into a multiclass classifier. For an N -class problem, the DDAG contains N(N - 1)/2 classifiers, one for each pair of classes. We present a VC analysis of the case when the node classifiers are hyperplanes; the resulting bound on the test error depends on N and on the margin achieved at the nodes, but not on the dimension of the space. This motivates an algorithm, DAGSVM, which operates in a kernel-induced feature space and uses two-class maximal margin hyperplanes at each decision-node of the DDAG. The DAGSVM is substantially faster to train and evaluate than either the standard algorithm or Max Wins, while maintaining comparable accuracy to both of these algorithms. 1 Introduction The problem of multiclass classificatIon, especially for systems like SVMs, doesn't present an easy solution. It is generally simpler to construct classifier theory and algorithms for two mutually-exclusive classes than for N mutually-exclusive classes.

algorithm, classifier, node, (15 more...)

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Washington > King County > Redmond (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Ormoneit, Dirk, Hastie, Trevor

Optimal Kernel Shapes for Local Linear Regression

Local linear regression performs very well in many low-dimensional forecasting problems. In high-dimensional spaces, its performance typically decays due to the well-known "curse-of-dimensionality". A possible way to approach this problem is by varying the "shape" of the weighting kernel. In this work we suggest a new, data-driven method to estimating the optimal kernel shape. Experiments using an artificially generated data set and data from the UC Irvine repository show the benefits of kernel shaping. 1 Introduction Local linear regression has attracted considerable attention in both statistical and machine learning literature as a flexible tool for nonparametric regression analysis [Cle79, FG96, AMS97]. Like most statistical smoothing approaches, local modeling suffers from the so-called "curse-of-dimensionality", the well-known fact that the proportion of the training data that lie in a fixed-radius neighborhood of a point decreases to zero at an exponential rate with increasing dimension of the input space.

kernel, optimal kernel shape, weighting kernel, (11 more...)

Country: North America > United States > California > Santa Clara County > Stanford (0.04)

Genre: Research Report > New Finding (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Ng, Andrew Y., Jordan, Michael I.

Approximate Inference A lgorithms for Two-Layer Bayesian Networks

We present a class of approximate inference algorithms for graphical models of the QMR-DT type. We give convergence rates for these algorithms and for the Jaakkola and Jordan (1999) algorithm, and verify these theoretical predictions empirically.

algorithm, jaakkola and jordan, probability, (14 more...)

Country:

Asia > Middle East > Jordan (0.30)
North America > United States > California > Alameda County > Berkeley (0.05)
North America > United States > California > San Mateo County > San Mateo (0.04)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Mika, Sebastian, Rätsch, Gunnar, Weston, Jason, Schölkopf, Bernhard, Smola, Alex J., Müller, Klaus-Robert

Invariant Feature Extraction and Classification in Kernel Spaces

In hyperspectral imagery one pixel typically consists of a mixture of the reflectance spectra of several materials, where the mixture coefficients correspond to the abundances of the constituting materials. We assume linear combinations of reflectance spectra with some additive normal sensor noise and derive a probabilistic MAP framework for analyzing hyperspectral data. As the material reflectance characteristics are not know a priori, we face the problem of unsupervised linear unmixing.

abundance, algorithm, endmember, (12 more...)

Country:

North America > United States > Nevada (0.04)
Europe > Germany > Berlin (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
(9 more...)

Industry:

Health & Medicine (0.68)
Government > Regional Government > North America Government > United States Government (0.47)
Energy (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.30)

A Multi-class Linear Learning Algorithm Related to Winnow

Mesterharm, Chris

In this paper, we present Committee, a new multi-class learning algorithm related to the Winnow family of algorithms. Committee is an algorithm for combining the predictions of a set of sub-experts in the online mistake-bounded model oflearning. A sub-expert is a special type of attribute that predicts with a distribution over a finite number of classes. Committee learns a linear function of sub-experts and uses this function to make class predictions. We provide bounds for Committee that show it performs well when the target can be represented by a few relevant sub-experts. We also show how Committee can be used to solve more traditional problems composed of attributes. This leads to a natural extension that learns on multi-class problems that contain both traditional attributes and sub-experts.

algorithm, transformation, winnow algorithm, (15 more...)

Country:

North America > United States > New York (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Mason, Llew, Baxter, Jonathan, Bartlett, Peter L., Frean, Marcus R.

Boosting Algorithms as Gradient Descent

Recent theoretical results suggest that the effectiveness of these algorithms is due to their tendency to produce large margin classifiers [1, 18]. Loosely speaking, if a combination of classifiers correctly classifies most of the training data with a large margin, then its error probability is small. In [14] we gave improved upper bounds on the misclassification probability of a combined classifier in terms of the average over the training data of a certain cost function of the margins.

adaboost, algorithm, cost function, (16 more...)

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.05)
Oceania > Australia > Queensland > Brisbane (0.04)
North America > United States > Oregon (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.43)

Margaritis, Dimitris, Thrun, Sebastian

Bayesian Network Induction via Local Neighborhoods

In recent years, Bayesian networks have become highly successful tool for diagnosis, analysis, and decision making in real-world domains. We present an efficient algorithm for learning Bayes networks from data.

algorithm, bayesian network induction, markov blanket, (13 more...)

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.62)