Sonnenburg, Sören
Hierarchical Multitask Structured Output Learning for Large-scale Sequence Segmentation
Goernitz, Nico, Widmer, Christian, Zeller, Georg, Kahles, Andre, Rätsch, Gunnar, Sonnenburg, Sören
We present a novel regularization-based Multitask Learning (MTL) formulation for Structured Output (SO) prediction for the case of hierarchical task relations. Structured output learning often results in difficult inference problems and requires large amounts of training data to obtain accurate models. We propose to use MTL to exploit information available for related structured output learning tasks by means of hierarchical regularization. Due to the combination of example sets, the cost of training models for structured output prediction can easily become infeasible for real world applications. We thus propose an efficient algorithm based on bundle methods to solve the optimization problems resulting from MTL structured output learning. We demonstrate the performance of our approach on gene finding problems from the application domain of computational biology. We show that 1) our proposed solver achieves much faster convergence than previous methods and 2) that the Hierarchical SO-MTL approach clearly outperforms considered non-MTL methods.
Efficient and Accurate Lp-Norm Multiple Kernel Learning
Kloft, Marius, Brefeld, Ulf, Laskov, Pavel, Müller, Klaus-Robert, Zien, Alexander, Sonnenburg, Sören
Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations and hence support interpretability. Unfortunately, L1-norm MKL is hardly observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures, we generalize MKL to arbitrary Lp-norms. We devise new insights on the connection between several existing MKL formulations and develop two efficient interleaved optimization strategies for arbitrary p>1. Empirically, we demonstrate that the interleaved optimization strategies are much faster compared to the traditionally used wrapper approaches. Finally, we apply Lp-norm MKL to real-world problems from computational biology, showing that non-sparse MKL achieves accuracies that go beyond the state-of-the-art.
Large Scale Hidden Semi-Markov SVMs
Rätsch, Gunnar, Sonnenburg, Sören
We describe Hidden Semi-Markov Support Vector Machines (SHM SVMs), an extension of HM SVMs to semi-Markov chains. This allows us to predict segmentations ofsequences based on segment-based features measuring properties such as the length of the segment. We propose a novel technique to partition the problem into sub-problems. The independently obtained partial solutions can then be recombined in an efficient way, which allows us to solve label sequence learning problemswith several thousands of labeled sequences. We have tested our algorithm for predicting gene structures, an important problem in computational biology. Results on a well-known model organism illustrate the great potential of SHM SVMs in computational biology.
Computation of Similarity Measures for Sequential Data using Generalized Suffix Trees
Rieck, Konrad, Laskov, Pavel, Sonnenburg, Sören
We propose a generic algorithm for computation of similarity measures for sequential data. The algorithm uses generalized suffix trees for efficient calculation of various kernel, distance and non-metric similarity functions. Its worst-case run-time is linear in the length of sequences and independent of the underlying embedding language, which can cover words, k-grams or all contained subsequences. Experiments with network intrusion detection, DNA analysis and text processing applications demonstrate the utility of distances and similarity coefficients for sequences as alternatives to classical kernel functions.
A General and Efficient Multiple Kernel Learning Algorithm
Sonnenburg, Sören, Rätsch, Gunnar, Schäfer, Christin
While classical kernel-based learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lankriet et al. (2004) considered conic combinations of kernel matrices for classification, leadingto a convex quadratically constraint quadratic program. We show that it can be rewritten as a semi-infinite linear program that can be efficiently solved by recycling the standard SVM implementations. Moreover,we generalize the formulation and our method to a larger class of problems, including regression and one-class classification. Experimentalresults show that the proposed algorithm helps for automatic model selection, improving the interpretability of the learning resultand works for hundred thousands of examples or hundreds of kernels to be combined.