An Application of Boosting to Graph Classification

Neural Information Processing Systems 

This paper presents an application of Boosting for classifying labeled graphs, general structures for modeling a number of real-world data, such as chemical compounds, natural language texts, and bio sequences. The proposal consists of i) decision stumps that use subgraph as features, and ii) a Boosting algorithm in which subgraph-based decision stumps are used as weak learners. We also discuss the relation between our al- gorithm and SVMs with convolution kernels. Two experiments using natural language data and chemical compounds show that our method achieves comparable or even better performance than SVMs with convo- lution kernels as well as improves the testing efficiency. Most machine learning (ML) algorithms assume that given instances are represented in numerical vectors. However, much real-world data is not represented as numerical vectors, but as more complicated structures, such as sequences, trees, or graphs.