AITopics

A new method for multivariate density estimation is developed based on the Support Vector Method (SVM) solution of inverse ill-posed problems. The solution has the form of a mixture of densities. This method with Gaussian kernels compared favorably to both Parzen's method and the Gaussian Mixture Model method. For synthetic data we achieve more accurate estimates for densities of 2, 6, 12, and 40 dimensions. 1 Introduction The problem of multivariate density estimation is important for many applications, in particular, for speech recognition [1] [7]. When the unknown density belongs to a parametric set satisfying certain conditions one can estimate it using the maximum likelihood (ML) method. Often these conditions are too restrictive. Therefore, nonparametric methods were proposed. The most popular of these, Parzen's method [5], uses the following estimate given data

parzen, support vector method, svm method, (13 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.63)

The Relevance Vector Machine

Tipping, Michael E.

The support vector machine (SVM) is a state-of-the-art technique for regression and classification, combining excellent generalisation properties with a sparse kernel representation. However, it does suffer from a number of disadvantages, notably the absence of probabilistic outputs, the requirement to estimate a tradeoff parameter and the need to utilise'Mercer' kernel functions. In this paper we introduce the Relevance Vector Machine (RVM), a Bayesian treatment of a generalised linear model of identical functional form to the SVM. The RVM suffers from none of the above disadvantages, and examples demonstrate that for comparable generalisation performance, the RVM requires dramatically fewer kernel functions.

classification, dataset, kernel function, (13 more...)

Country:

North America > United States > New York (0.05)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > United Kingdom (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.91)

On Input Selection with Reversible Jump Markov Chain Monte Carlo Sampling

Sykacek, Peter

In this paper we will treat input selection for a radial basis function (RBF) like classifier within a Bayesian framework. We approximate the a-posteriori distribution over both model coefficients and input subsets by samples drawn with Gibbs updates and reversible jump moves. Using some public datasets, we compare the classification accuracy of the method with a conventional ARD scheme. These datasets are also used to infer the a-posteriori probabilities of different input subsets. 1 Introduction Methods that aim to determine relevance of inputs have always interested researchers in various communities. Classical feature subset selection techniques, as reviewed in [1], use search algorithms and evaluation criteria to determine one optimal subset.

coefficient, probability, subset, (14 more...)

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.43)

Slonim, Noam, Tishby, Naftali

Agglomerative Information Bottleneck

This question was recently shown in [9] to be a special case of a much more fundamental problem: What are the features of the variable X that are relevant for the prediction of another, relevance, variable Y?

algorithm, information, mutual information, (14 more...)

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Leveraged Vector Machines

Singer, Yoram

We describe an iterative algorithm for building vector machines used in classification tasks. The algorithm builds on ideas from support vector machines, boosting, and generalized additive models. The algorithm can be used with various continuously differential functions that bound the discrete (0-1) classification loss and is very simple to implement. We test the proposed algorithm with two different loss functions on synthetic and natural data. We also describe a norm-penalized version of the algorithm for the exponential loss function used in AdaBoost.

algorithm, loss function, vector machine, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)

Bayesian Model Selection for Support Vector Machines, Gaussian Processes and Other Kernel Classifiers

Seeger, Matthias

We present a variational Bayesian method for model selection over families of kernels classifiers like Support Vector machines or Gaussian processes. The algorithm needs no user interaction and is able to adapt a large number of kernel parameters to given data without having to sacrifice training cases for validation. This opens the possibility to use sophisticated families of kernels in situations where the small "standard kernel" classes are clearly inappropriate. We relate the method to other work done on Gaussian processes and clarify the relation between Support Vector machines and certain Gaussian process models.

approximation, gaussian process, support vector machine, (11 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Wisconsin (0.05)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.72)

Schölkopf, Bernhard, Williamson, Robert C., Smola, Alex J., Shawe-Taylor, John, Platt, John C.

Support Vector Method for Novelty Detection

Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified

algorithm, feature space, outlier, (15 more...)

Country:

North America > United States > Washington > King County > Redmond (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > New York (0.04)
(3 more...)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.66)

Roth, Volker, Steinhage, Volker

Nonlinear Discriminant Analysis Using Kernel Functions

Fishers linear discriminant analysis (LDA) is a classical multivariate technique both for dimension reduction and classification. The data vectors are transformed into a low dimensional subspace such that the class centroids are spread out as much as possible. In this subspace LDA works as a simple prototype classifier with linear decision boundaries. However, in many applications the linear boundaries do not adequately separate the classes. We present a nonlinear generalization of discriminant analysis that uses the kernel trick of representing dot products by kernel functions.

algorithm, discriminant analysis, vector, (16 more...)

Country:

North America > United States > California > Monterey County > Monterey (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.74)

Rätsch, Gunnar, Schölkopf, Bernhard, Smola, Alex J., Müller, Klaus-Robert, Onoda, Takashi, Mika, Sebastian

v-Arc: Ensemble Learning in the Presence of Outliers

The idea of a large minimum margin [17] explains the good generalization performance of AdaBoost in the low noise regime. However, AdaBoost performs worse on noisy tasks [10, 11], such as the iris and the breast cancer benchmark data sets [1]. On the latter tasks, a large margin on all training points cannot be achieved without adverse effects on the generalization error. This experimental observation was supported by the study of [13] where the generalization error of ensemble methods was bounded by the sum of the fraction of training points which have a margin smaller than some value p, say, plus a complexity term depending on the base hypotheses and p. While this bound can only capture part of what is going on in practice, it nevertheless already conveys the message that in some cases it pays to allow for some points which have a small margin, or are misclassified, if this leads to a larger overall margin on the remaining points. To cope with this problem, it was mandatory to construct regularized variants of AdaBoost, which traded off the number of margin errors and the size of the margin 562 G. Riitsch, B. Sch6lkopf, A. J. Smola, K.-R.

adaboost, algorithm, ensemble learning, (16 more...)

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Oceania > Australia > Queensland (0.04)
(6 more...)

Industry:

Health & Medicine > Therapeutic Area (0.54)
Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Platt, John C., Cristianini, Nello, Shawe-Taylor, John

Large Margin DAGs for Multiclass Classification

We present a new learning architecture: the Decision Directed Acyclic Graph (DDAG), which is used to combine many two-class classifiers into a multiclass classifier. For an N -class problem, the DDAG contains N(N - 1)/2 classifiers, one for each pair of classes. We present a VC analysis of the case when the node classifiers are hyperplanes; the resulting bound on the test error depends on N and on the margin achieved at the nodes, but not on the dimension of the space. This motivates an algorithm, DAGSVM, which operates in a kernel-induced feature space and uses two-class maximal margin hyperplanes at each decision-node of the DDAG. The DAGSVM is substantially faster to train and evaluate than either the standard algorithm or Max Wins, while maintaining comparable accuracy to both of these algorithms. 1 Introduction The problem of multiclass classificatIon, especially for systems like SVMs, doesn't present an easy solution. It is generally simpler to construct classifier theory and algorithms for two mutually-exclusive classes than for N mutually-exclusive classes.

algorithm, classifier, node, (15 more...)

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Washington > King County > Redmond (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)