AITopics

This paper introduces an algorithm for the automatic relevance determination of input variables in kernelized Support Vector Machines. Relevance is measured by scale factors defining the input space metric, and feature selection is performed by assigning zero weights to irrelevant variables. The metric is automatically tuned by the minimization of the standard SVM empirical risk, where scale factors are added to the usual set of parameters defining the classifier. Feature selection is achieved by constraints encouraging the sparsity of scale factors. The resulting algorithm compares favorably to state-of-the-art feature selection procedures and demonstrates its effectiveness on a demanding facial expression recognition problem.

algorithm, classifier, selection, (14 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > France > Normandy > Seine-Maritime > Rouen (0.05)
Europe > France > Hauts-de-France > Oise > Compiègne (0.05)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.94)

Hochreiter, Sepp, Mozer, Michael C., Obermayer, Klaus

Coulomb Classifiers: Generalizing Support Vector Machines via an Analogy to Electrostatic Systems

We introduce a family of classifiers based on a physical analogy to an electrostatic system of charged conductors. The family, called Coulomb classifiers, includes the two best-known support-vector machines (SVMs), the ν-SVM and the C-SVM. In the electrostatics analogy, a training example corresponds to a charged conductor at a given location in space, the classification function corresponds to the electrostatic potential function, and the training objective function corresponds to the Coulomb energy. The electrostatic framework provides not only a novel interpretation of existing algorithms and their interrelationships, but it suggests a variety of new methods for SVMs including kernels that bridge the gap between polynomial and radial-basis functions, objective functions that do not require positive-definite kernels, regularization techniques that allow for the construction of an optimal classifier in Minkowski space. Based on the framework, we propose novel SVMs and perform simulation studies to show that they are comparable or superior to standard SVMs. The experiments include classification tasks on data which are represented in terms of their pairwise proximities, where a Coulomb Classifier outperformed standard SVMs.

conductor, electrostatic system, kernel, (14 more...)

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
Europe > Germany > Berlin (0.04)

Genre: Instructional Material (0.48)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Fung, Glenn M., Mangasarian, Olvi L., Shavlik, Jude W.

Knowledge-Based Support Vector Machine Classifiers

Prior knowledge in the form of multiple polyhedral sets, each belonging to one of two categories, is introduced into a reformulation of a linear support vector machine classifier. The resulting formulation leads to a linear program that can be solved efficiently. Real world examples, from DNA sequencing and breast cancer prognosis, demonstrate the effectiveness of the proposed method. Numerical results show improvement in test set accuracy after the incorporation of prior knowledge into ordinary, data-based linear support vector machine classifiers. One experiment also shows that a linear classifier, based solely on prior knowledge, far outperforms the direct application of prior knowledge rules to classify data.

classifier, formulation, knowledge, (11 more...)

Country:

North America > United States > Wisconsin > Dane County > Madison (0.29)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York (0.05)
(3 more...)

Genre: Research Report (0.48)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Ong, Cheng S., Williamson, Robert C., Smola, Alex J.

Hyperkernels

We consider the problem of choosing a kernel suitable for estimation using a Gaussian Process estimator or a Support Vector Machine. A novel solution is presented which involves defining a Reproducing Kernel Hilbert Space on the space of kernels itself. By utilizing an analog of the classical representer theorem, the problem of choosing a kernel from a parameterized family of kernels (e.g. of varying width) is reduced to a statistical estimation problem akin to the problem of minimizing a regularized risk functional. Various classical settings for model or kernel selection are special cases of our framework.

hyperkernel, kernel, reproducing kernel hilbert space, (13 more...)

Country:

Asia > Middle East > Jordan (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > Wisconsin (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.56)

Bousquet, Olivier, Herrmann, Daniel

On the Complexity of Learning the Kernel Matrix

We investigate data based procedures for selecting the kernel when learning with Support Vector Machines. We provide generalization error bounds by estimating the Rademacher complexities of the corresponding function classes. In particular we obtain a complexity bound for function classes induced by kernels with given eigenvectors, i.e., we allow to vary the spectrum and keep the eigenvectors fix. This bound is only a logarithmic factor bigger than the complexity of the function class induced by a single kernel. However, optimizing the margin over such classes leads to overfitting. We thus propose a suitable way of constraining the class. We use an efficient algorithm to solve the resulting optimization problem, present preliminary experimental results, and compare them to an alignment-based approach.

complexity, kernel, kernel matrix, (15 more...)

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)

Williams, Christopher, Shawe-taylor, John S.

The Stability of Kernel Principal Components Analysis and its Relation to the Process Eigenspectrum

In this paper we analyze the relationships between the eigenvalues of the m x m Gram matrix K for a kernel k(·,.) We bound the dif between the two spectra and provide a performance bound on kernel peA. 1 Introduction Over recent years there has been a considerable amount of interest in kernel methods for supervised learning (e.g. Support Vector Machines and Gaussian Process predict ion) and for unsupervised learning (e.g. In this paper we study the stability of the subspace of feature space extracted by kernel peA with respect to the sample of size m, and relate this to the feature space that would be extracted in the infinite sample-size limit. This analysis essentially "lifts" into (a potentially infinite dimensional) feature space an analysis which can also be carried out for peA, comparing the k-dimensional eigenspace extracted from a sample covariance matrix and the k-dimensional eigenspace extracted from the population covariance matrix, and comparing the residuals from the k-dimensional compression for the m-sample and the population.

eigenvalue, feature space, matrix, (14 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.41)

Scott, Clayton, Nowak, Robert

Dyadic Classification Trees via Structural Risk Minimization

Classification trees are one of the most popular types of classifiers, with ease of implementation and interpretation being among their attractive features. Despite the widespread use of classification trees, theoretical analysis of their performance is scarce. In this paper, we show that a new family of classification trees, called dyadic classification trees (DCTs), are near optimal (in a minimax sense) for a very broad range of classification problems. This demonstrates that other schemes (e.g., neural networks, support vector machines) cannot perform significantly better than DCTs in many cases. We also show that this near optimal performance is attained with linear (in the number of training data) complexity growing and pruning algorithms. Moreover, the performance of DCTs on benchmark datasets compares favorably to that of standard CART, which is generally more computationally intensive and which does not possess similar near optimality properties. Our analysis stems from theoretical results on structural risk minimization, on which the pruning rule for DCTs is based.

classification tree, classifier, dyadic classification tree, (13 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York (0.05)
North America > United States > Wisconsin (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.31)

Hochreiter, Sepp, Mozer, Michael C., Obermayer, Klaus

Coulomb Classifiers: Generalizing Support Vector Machines via an Analogy to Electrostatic Systems

We introduce a family of classifiers based on a physical analogy to an electrostatic system of charged conductors. The family, called Coulomb classifiers, includes the two best-known support-vector machines (SVMs), the ν-SVM and the C-SVM. In the electrostatics analogy,a training example corresponds to a charged conductor at a given location in space, the classification function corresponds to the electrostatic potential function, and the training objective function corresponds to the Coulomb energy. The electrostatic framework provides not only a novel interpretation of existing algorithms andtheir interrelationships, but it suggests a variety of new methods for SVMs including kernels that bridge the gap between polynomial and radial-basis functions, objective functions that do not require positive-definite kernels, regularization techniques that allow for the construction of an optimal classifier in Minkowski space. Based on the framework, we propose novel SVMs and perform simulationstudies to show that they are comparable or superior tostandard SVMs. The experiments include classification tasks on data which are represented in terms of their pairwise proximities, wherea Coulomb Classifier outperformed standard SVMs.

conductor, electrostatic system, kernel, (14 more...)

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
Europe > Germany > Berlin (0.04)

Genre: Instructional Material (0.48)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Hochreiter, Sepp, Obermayer, Klaus

Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers

We investigate the problem of learning a classification task for datasets which are described by matrices. Rows and columns of these matrices correspond to objects, where row and column objects may belong to different sets, and the entries in the matrix express the relationships between them. We interpret the matrix elements as being produced by an unknown kernel which operates on object pairs and we show that - under mild assumptions - these kernels correspond to dot products in some (unknown) feature space. Minimizing a bound for the generalization error of a linear classifier which has been obtained using covering numbers we derive an objective function for model selection according to the principle of structural risk minimization. The new objective function has the advantage that it allows the analysis of matrices which are not positive definite, and not even symmetric or square.

artificial intelligence, classifier, machine learning, (16 more...)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.49)
Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.30)

Herbrich, Ralf, Lawrence, Neil D., Seeger, Matthias

Fast Sparse Gaussian Process Methods: The Informative Vector Machine

We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on informationtheoretic principles,previously suggested for active learning. Our goal is not only to learn d-sparse predictors (which can be evaluated inO(d) rather than O(n), d n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements.

artificial intelligence, machine learning, selection, (19 more...)

Country: Europe > United Kingdom (0.14)

Technology:

Information Technology > Modeling & Simulation (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.30)