AITopics

This paper presents two new formulations of multiple-instance learning as a maximum margin problem. The proposed extensions of the Support Vector Machine (SVM) learning approach lead to mixed integer quadratic programs that can be solved heuristically. Our generalization of SVMs makes a state-of-the-art classification technique, including nonlinear classification via kernels, available to an area that up to now has been largely dominated by special purpose methods. We present experimental results on a pharmaceutical dataset and on applications in automated image indexing and document categorization. 1 Introduction Multiple-instance learning (MIL) [4] is a generalization of supervised classification in which training class labels are associated with sets of patterns, or bags, instead of individual patterns. While every pattern may possess an associated true label, it is assumed that pattern labels are only indirectly accessible through labels attached to bags.

artificial intelligence, integer variable, machine learning, (15 more...)

Country: North America > United States > California > San Francisco County > San Francisco (0.15)

Genre: Research Report (0.47)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Grandvalet, Yves, Canu, Stéphane

Adaptive Scaling for Feature Selection in SVMs

This paper introduces an algorithm for the automatic relevance determination ofinput variables in kernelized Support Vector Machines. Relevance is measured by scale factors defining the input space metric, and feature selection is performed by assigning zero weights to irrelevant variables. The metric is automatically tuned by the minimization of the standard SVM empirical risk, where scale factors are added to the usual set of parameters defining the classifier. Feature selection is achieved by constraints encouraging the sparsity of scale factors. The resulting algorithm compares favorably to state-of-the-art feature selection procedures anddemonstrates its effectiveness on a demanding facial expression recognition problem.

algorithm, artificial intelligence, machine learning, (16 more...)

Country:

Europe > France (0.29)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.94)

Fung, Glenn M., Mangasarian, Olvi L., Shavlik, Jude W.

Knowledge-Based Support Vector Machine Classifiers

Prior knowledge in the form of multiple polyhedral sets, each belonging toone of two categories, is introduced into a reformulation of a linear support vector machine classifier. The resulting formulation leadsto a linear program that can be solved efficiently. Real world examples, from DNA sequencing and breast cancer prognosis, demonstrate the effectiveness of the proposed method. Numerical results show improvement in test set accuracy after the incorporation ofprior knowledge into ordinary, data-based linear support vector machine classifiers. One experiment also shows that a linear classifier,based solely on prior knowledge, far outperforms the direct application of prior knowledge rules to classify data.

artificial intelligence, knowledge, machine learning, (14 more...)

Country: North America > United States > Wisconsin > Dane County > Madison (0.29)

Genre: Research Report (0.48)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Williams, Christopher, Shawe-taylor, John S.

The Stability of Kernel Principal Components Analysis and its Relation to the Process Eigenspectrum

I. Williams School of Informatics University of Edinburgh c.k.i.williams ed.ac.uk Abstract In this paper we analyze the relationships between the eigenvalues of the m x m Gram matrix K for a kernel k(·, .) We bound the differences betweenthe two spectra and provide a performance bound on kernel peA. 1 Introduction Over recent years there has been a considerable amount of interest in kernel methods for supervised learning (e.g. Support Vector Machines and Gaussian Process predict ion)and for unsupervised learning (e.g. In this paper we study the stability of the subspace of feature space extracted by kernel peA with respect to the sample of size m, and relate this to the feature space that would be extracted in the infinite sample-size limit. This analysis essentially "lifts" into (a potentially infinite dimensional) feature space an analysis which can also be carried out for peA, comparing the k-dimensional eigenspace extracted from a sample covariance matrix and the k-dimensional eigenspace extracted from the population covariance matrix, and comparing the residuals from the k-dimensional compression for the m-sample and the population.

artificial intelligence, eigenvalue, machine learning, (17 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.41)

Scott, Clayton, Nowak, Robert

Dyadic Classification Trees via Structural Risk Minimization

Classification trees are one of the most popular types of classifiers, with ease of implementation and interpretation being among their attractive features. Despite the widespread use of classification trees, theoretical analysis of their performance is scarce. In this paper, we show that a new family of classification trees, called dyadic classification trees (DCTs), are near optimal (in a minimax sense) for a very broad range of classification problems.This demonstrates that other schemes (e.g., neural networks, support vector machines) cannot perform significantly better than DCTs in many cases. We also show that this near optimal performance isattained with linear (in the number of training data) complexity growing and pruning algorithms. Moreover, the performance of DCTs on benchmark datasets compares favorably to that of standard CART, which is generally more computationally intensive and which does not possess similar near optimality properties. Our analysis stems from theoretical resultson structural risk minimization, on which the pruning rule for DCTs is based.

artificial intelligence, classification tree, machine learning, (15 more...)

Country:

North America > United States (0.47)
Europe > United Kingdom > England (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.31)

Chen, Yiling, Councill, Isaac G.

An Introduction to Support Vector Machines: A Review

AI MagazineJun-15-2003

Kernel functions can implicitly combine these two steps (nonlinear mapping and linear learning) into one step in constructing a nonlinear learning machine.

artificial intelligence, linear learning machine, machine learning, (14 more...)

AI Magazine

Country: North America > United States (0.16)

Genre: Summary/Review (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Heisele, Bernd, Serre, Thomas, Pontil, Massimiliano, Vetter, Thomas, Poggio, Tomaso

Categorization by Learning and Combining Object Parts

We describe an algorithm for automatically learning discriminative components of objects with SVM classifiers. It is based on growing image parts by minimizing theoretical bounds on the error probability of an SVM. Component-based face classifiers are then combined in a second stage to yield a hierarchical SVM classifier. Experimental results in face classification show considerable robustness against rotations in depth and suggest performance at significantly better level than other face detection systems. Novel aspects of our approach are: a) an algorithm to learn component-based classification experts and their combination, b) the use of 3-D morphable models for training, and c) a maximum operation on the output of each component classifier which may be relevant for biological models of visual recognition.

artificial intelligence, classifier, machine learning, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
Europe > Germany > Baden-Württemberg > Freiburg (0.05)
North America > United States > New York (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.94)

Fine, Shai, Scheinberg, Katya

Incremental Learning and Selective Sampling via Parametric Optimization Framework for SVM

We propose a framework based on a parametric quadratic programming (QP) technique to solve the support vector machine (SVM) training problem. This framework, can be specialized to obtain two SVM optimization methods. The first solves the fixed bias problem, while the second starts with an optimal solution for a fixed bias problem and adjusts the bias until the optimal value is found. The later method can be applied in conjunction with any other existing technique which obtains a fixed bias solution. Moreover, the second method can also be used independently to solve the complete SVM training problem. A combination of these two methods is more flexible than each individual method and, among other things, produces an incremental algorithm which exactly solve the 1-Norm Soft Margin SVM optimization problem. Applying Selective Sampling techniques may further boost convergence.

algorithm, iteration, optimal solution, (11 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.70)

Dietterich, Thomas G., Wang, Xin

Batch Value Function Approximation via Support Vectors

Virtually all existing work on value function approximation and policy-gradient methods starts with a parameterized formula for the value function or policy and thenseeks to find the best policythat canbe representedinthat parameterizedform. This can give rise to very difficult search problems for which the Bellman equation is of little or no use. In this paper, we take a different approach: rather than fixing the form of the function approximator and searching for a representable policy, we instead identify a good policy and then search for a function approximator that can represent it. Our approach exploits the ability of mathematical programming to represent a variety of constraints including those that derive from supervised learning, from advantage learning (Baird, 1993), and from the Bellman equation. By combining the kernel trick with mathematical programming, we obtain a function approximator that seeks to find the smallest number of support vectors sufficient to represent the desired policy.

formulation, function approximation, function approximator, (11 more...)

Country: North America > United States > Oregon > Benton County > Corvallis (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.63)

Speech Recognition using SVMs

Smith, N., Gales, Mark

An important issue in applying SVMs to speech recognition is the ability to classify variable length sequences. This paper presents extensions to a standard scheme for handling this variable length data, the Fisher score. A more useful mapping is introduced based on the likelihood-ratio. The score-space defined by this mapping avoids some limitations of the Fisher score. Class-conditional generative models are directly incorporated into the definition of the score-space.

classifier, generative model, kernel, (13 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
North America > United States > Utah (0.04)

Technology:

Information Technology > Artificial Intelligence > Speech (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.30)