Cristianini, Nello
Spectral Kernel Methods for Clustering
Cristianini, Nello, Shawe-Taylor, John, Kandola, Jaz S.
In this paper we introduce new algorithms for unsupervised learning based on the use of a kernel matrix. All the information required by such algorithms is contained in the eigenvectors of the matrix or of closely related matrices. We use two different but related cost functions, the Alignment and the'cut cost'. The first one is discussed in a companion paper [3], the second one is based on graph theoretic concepts. Both functions measure the level of clustering of a labeled dataset, or the correlation between data clusters and labels.
Spectral Kernel Methods for Clustering
Cristianini, Nello, Shawe-Taylor, John, Kandola, Jaz S.
In this paper we introduce new algorithms for unsupervised learning based on the use of a kernel matrix. All the information required by such algorithms is contained in the eigenvectors of the matrix or of closely related matrices. We use two different but related cost functions, the Alignment and the'cut cost'. The first one is discussed in a companion paper [3], the second one is based on graph theoretic concepts. Both functions measure the level of clustering of a labeled dataset, or the correlation between data clusters and labels.
On Kernel-Target Alignment
Cristianini, Nello, Shawe-Taylor, John, Elisseeff, Andrรฉ, Kandola, Jaz S.
We introduce the notion of kernel-alignment, a measure of similarity betweentwo kernel functions or between a kernel and a target function. This quantity captures the degree of agreement between a kernel and a given learning task, and has very natural interpretations inmachine learning, leading also to simple algorithms for model selection and learning.
Spectral Kernel Methods for Clustering
Cristianini, Nello, Shawe-Taylor, John, Kandola, Jaz S.
In this paper we introduce new algorithms for unsupervised learning basedon the use of a kernel matrix. All the information required bysuch algorithms is contained in the eigenvectors of the matrix or of closely related matrices. We use two different but related costfunctions, the Alignment and the'cut cost'. The first one is discussed in a companion paper [3], the second one is based on graph theoretic concepts. Both functions measure the level of clustering of a labeled dataset, or the correlation between data clusters andlabels.
Support Vector Machines and Kernel Methods: The New Generation of Learning Machines
Cristianini, Nello, Scholkopf, Bernhard
Kernel methods, a new generation of learning algorithms, utilize techniques from optimization, statistics, and functional analysis to achieve maximal generality, flexibility, and performance. These algorithms are different from earlier techniques used in machine learning in many respects: For example, they are explicitly based on a theoretical model of learning rather than on loose analogies with natural learning systems or other heuristics. Although the research is not concluded, already now kernel methods are considered the state of the art in several machine learning tasks. Their ease of use, theoretical appeal, and remarkable performance have made them the system of choice for many learning problems.
Support Vector Machines and Kernel Methods: The New Generation of Learning Machines
Cristianini, Nello, Scholkopf, Bernhard
Kernel methods, a new generation of learning algorithms, utilize techniques from optimization, statistics, and functional analysis to achieve maximal generality, flexibility, and performance. These algorithms are different from earlier techniques used in machine learning in many respects: For example, they are explicitly based on a theoretical model of learning rather than on loose analogies with natural learning systems or other heuristics. They come with theoretical guarantees about their performance and have a modular design that makes it possible to separately implement and analyze their components. They are not affected by the problem of local minima because their training amounts to convex optimization. In the last decade, a sizable community of theoreticians and practitioners has formed around these methods, and a number of practical applications have been realized. Although the research is not concluded, already now kernel methods are considered the state of the art in several machine learning tasks. Their ease of use, theoretical appeal, and remarkable performance have made them the system of choice for many learning problems. Successful applications range from text categorization to handwriting recognition to classification of geneexpression data.
Text Classification using String Kernels
Lodhi, Huma, Shawe-Taylor, John, Cristianini, Nello, Watkins, Christopher J. C. H.
We introduce a novel kernel for comparing two text documents. The kernel is an inner product in the feature space consisting of all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in the text, hence emphasising those occurrences which are close to contiguous. A direct computation of this feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be efficiently evaluated by a dynamic programming technique.
Text Classification using String Kernels
Lodhi, Huma, Shawe-Taylor, John, Cristianini, Nello, Watkins, Christopher J. C. H.
We introduce a novel kernel for comparing two text documents. The kernel is an inner product in the feature space consisting of all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in the text, hence emphasising those occurrences which are close to contiguous. A direct computation of this feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be efficiently evaluated by a dynamic programming technique.
Text Classification using String Kernels
Lodhi, Huma, Shawe-Taylor, John, Cristianini, Nello, Watkins, Christopher J. C. H.
A subsequence is any ordered sequence ofk characters occurring in the text though not necessarily contiguously. A direct computation ofthis feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be efficiently evaluated by a dynamic programming technique. An effective alternative to explicit feature extraction is provided by kernel methods. The learning then takes place in the feature space, provided the learning algorithm can be entirely rewritten so that the data points only appear inside dot products with other data points.
Large Margin DAGs for Multiclass Classification
Platt, John C., Cristianini, Nello, Shawe-Taylor, John
We present a new learning architecture: the Decision Directed Acyclic Graph (DDAG), which is used to combine many two-class classifiers into a multiclass classifier. For an N -class problem, the DDAG contains N(N - 1)/2 classifiers, one for each pair of classes. We present a VC analysis of the case when the node classifiers are hyperplanes; the resulting bound on the test error depends on N and on the margin achieved at the nodes, but not on the dimension of the space. This motivates an algorithm, DAGSVM, which operates in a kernel-induced feature space and uses two-class maximal margin hyperplanes at each decision-node of the DDAG. The DAGSVM is substantially faster to train and evaluate than either the standard algorithm or Max Wins, while maintaining comparable accuracy to both of these algorithms. 1 Introduction The problem of multiclass classificatIon, especially for systems like SVMs, doesn't present an easy solution. It is generally simpler to construct classifier theory and algorithms for two mutually-exclusive classes than for N mutually-exclusive classes.