Text Classification using String Kernels
Lodhi, Huma, Shawe-Taylor, John, Cristianini, Nello, Watkins, Christopher J. C. H.
–Neural Information Processing Systems
A subsequence is any ordered sequence ofk characters occurring in the text though not necessarily contiguously. A direct computation ofthis feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be efficiently evaluated by a dynamic programming technique. An effective alternative to explicit feature extraction is provided by kernel methods. The learning then takes place in the feature space, provided the learning algorithm can be entirely rewritten so that the data points only appear inside dot products with other data points.
Neural Information Processing Systems
Dec-31-2001
- Technology: