AITopics

This question was recently shown in [9] to be a special case of a much more fundamental problem:What are the features of the variable X that are relevant for the prediction of another, relevance, variable Y?

artificial intelligence, information, machine learning, (16 more...)

Country:

North America > United States > Ohio (0.14)
Asia > Middle East > Israel (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Bayesian Model Selection for Support Vector Machines, Gaussian Processes and Other Kernel Classifiers

Seeger, Matthias

We present a variational Bayesian method for model selection over families of kernels classifiers like Support Vector machines or Gaussian processes.The algorithm needs no user interaction and is able to adapt a large number of kernel parameters to given data without having to sacrifice training cases for validation. This opens the possibility touse sophisticated families of kernels in situations where the small "standard kernel" classes are clearly inappropriate. We relate the method to other work done on Gaussian processes and clarify the relation between Support Vector machines and certain Gaussian process models. 1 Introduction Bayesian techniques have been widely and successfully used in the neural networks and statistics community and are appealing because of their conceptual simplicity, generality and consistency with which they solve learning problems. In this paper we present a new method for applying the Bayesian methodology to Support Vector machines. We will briefly review Gaussian Process and Support Vector classification in this section and clarify their relationship by pointing out the common roots. Although we focus on classification here, it is straightforward to apply the methods to regression problems as well. In section 2 we introduce our algorithm and show relations to existing methods. Finally, we present experimental results in section 3 and close with a discussion in section 4. Let X be a measure space (e.g.

artificial intelligence, machine learning, support vector machine, (13 more...)

Country: Europe (0.46)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.72)

Rätsch, Gunnar, Schölkopf, Bernhard, Smola, Alex J., Müller, Klaus-Robert, Onoda, Takashi, Mika, Sebastian

v-Arc: Ensemble Learning in the Presence of Outliers

The idea of a large minimum margin [17] explains the good generalization performance ofAdaBoost in the low noise regime. However, AdaBoost performs worse on noisy tasks [10, 11], such as the iris and the breast cancer benchmark data sets [1]. On the latter tasks, a large margin on all training points cannot be achieved without adverse effects on the generalization error. This experimental observation was supported by the study of [13] where the generalization error of ensemble methods wasbounded by the sum of the fraction of training points which have a margin smaller than some value p, say, plus a complexity term depending on the base hypotheses andp. While this bound can only capture part of what is going on in practice, it nevertheless already conveys the message that in some cases it pays to allow for some points which have a small margin, or are misclassified, if this leads to a larger overall margin on the remaining points. To cope with this problem, it was mandatory to construct regularized variants of AdaBoost, which traded off the number of margin errors and the size of the margin 562 G.Riitsch, B. Sch6lkopf, A. J. Smola, K.-R.

algorithm, artificial intelligence, machine learning, (18 more...)

Country:

North America > United States (0.28)
Oceania > Australia (0.28)
Europe (0.28)
Asia > Japan (0.28)

Industry:

Health & Medicine > Therapeutic Area (0.54)
Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Mika, Sebastian, Rätsch, Gunnar, Weston, Jason, Schölkopf, Bernhard, Smola, Alex J., Müller, Klaus-Robert

Invariant Feature Extraction and Classification in Kernel Spaces

In hyperspectral imagery one pixel typically consists of a mixture of the reflectance spectra of several materials, where the mixture coefficients correspond to the abundances of the constituting materials. Weassume linear combinations of reflectance spectra with some additive normal sensor noise and derive a probabilistic MAP framework for analyzing hyperspectral data. As the material reflectance characteristicsare not know a priori, we face the problem of unsupervised linear unmixing.

algorithm, artificial intelligence, machine learning, (14 more...)

Country:

North America > United States (1.00)
Europe (0.93)

Industry:

Health & Medicine > Therapeutic Area (0.93)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.30)

Mason, Llew, Baxter, Jonathan, Bartlett, Peter L., Frean, Marcus R.

Boosting Algorithms as Gradient Descent

Recent theoretical results suggest that the effectiveness of these algorithms is due to their tendency to produce large margin classifiers [1, 18]. Loosely speaking, if a combination of classifiers correctly classifies most of the training data with a large margin, then its error probability is small. In [14] we gave improved upper bounds on the misclassification probability of a combined classifier in terms of the average over the training data of a certain cost function of the margins. That paper also described DOOM, an algorithm for directly minimizingthe margin cost function by adjusting the weights associated with Boosting Algorithms as Gradient Descent 513 each base classifier (the base classifiers are suppiled to DOOM). DOOM exhibits performance improvements over AdaBoost, even when using the same base hypotheses, whichprovides additional empirical evidence that these margin cost functions are appropriate quantities to optimize. In this paper, we present a general class of algorithms (called AnyBoost) which are gradient descent algorithms for choosing linear combinations of elements of an inner product function space so as to minimize some cost functional. The normal operation of a weak learner is shown to be equivalent to maximizing a certain inner product. We prove convergence of AnyBoost under weak conditions. In Section 3, we show that this general class of algorithms includes as special cases nearly all existing voting methods.

algorithm, artificial intelligence, machine learning, (18 more...)

Country:

Oceania > Australia > Queensland (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.83)

Margaritis, Dimitris, Thrun, Sebastian

Bayesian Network Induction via Local Neighborhoods

In recent years, Bayesian networks have become highly successful tool for diagnosis, analysis,and decision making in real-world domains. We present an efficient algorithm for learning Bayes networks from data.

artificial intelligence, bayesian inference, machine learning, (16 more...)

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.62)

Lee, Daniel D., Rokni, Uri, Sompolinsky, Haim

Algorithms for Independent Components Analysis and Higher Order Statistics

A latent variable generative model with finite noise is used to describe severaldifferent algorithms for Independent Components Analysis (lCA). In particular, the Fixed Point ICA algorithm is shown to be equivalent to the Expectation-Maximization algorithm for maximum likelihood under certain constraints, allowing the conditions for global convergence to be elucidated. The algorithms can also be explained by their generic behavior near a singular point where the size of the optimal generativebases vanishes. An expansion of the likelihood about this singular point indicates the role of higher order correlations in determining thefeatures discovered by ICA. The application and convergence of these algorithms are demonstrated on a simple illustrative example.

artificial intelligence, bayesian inference, machine learning, (13 more...)

Country: Asia > Middle East > Israel (0.15)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

An Improved Decomposition Algorithm for Regression Support Vector Machines

Laskov, Pavel

The algorithm builds on the basic principles of decomposition proposed by Osuna et.

algorithm, artificial intelligence, machine learning, (11 more...)

Country: North America > United States > Delaware > New Castle County > Newark (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.70)

Jojic, Nebojsa, Frey, Brendan J.

Topographic Transformation as a Discrete Latent Variable

We describe a way to add transformation invariance toa generative density model by approximating the nonlinear transformation manifold by a discrete set of transformations. An EM algorithm for the original model can be extended to the new model by computing expectations over the set of transformations. We show how to add a discrete transformation variable to Gaussian mixture modeling, factor analysis and mixtures of factor analysis. We give results on filtering microscopy images, face and facial pose clustering, and handwritten digit modeling and recognition.

artificial intelligence, machine learning, transformation, (16 more...)

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Hinton, Geoffrey E., Ghahramani, Zoubin, Teh, Yee Whye

Learning to Parse Images

We describe a class of probabilistic models that we call credibility networks. Using parse trees as internal representations of images, credibility networks are able to perform segmentation and recognition simultaneously,removing the need for ad hoc segmentation heuristics. Promising results in the problem of segmenting handwritten digitswere obtained.

artificial intelligence, machine learning, natural language, (18 more...)

Country:

Europe > United Kingdom (0.28)
North America > Canada > Ontario > Toronto (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.95)
(2 more...)