Diversifying Support Vector Machines for Boosting using Kernel Perturbation: Applications to Class Imbalance and Small Disjuncts

Datta, Shounak, Nag, Sayak, Mullick, Sankha Subhra, Das, Swagatam

Dec-22-2017–arXiv.org Machine Learning

Abstract--The diversification (generating slightly varying separating discriminators) of Support V ector Machines (SVMs) for boosting has proven to be a challenge due to the strong learning nature of SVMs. Based on the insight that perturbing the SVM kernel may help in diversifying SVMs, we propose two kernel perturbation based boosting schemes where the kernel is modified in each round so as to increase the resolution of the kernel-induced Reimannian metric in the vicinity of the datapoints misclassified in the previous round. We propose a method for identifying the disjuncts in a dataset, dispelling the dependence on rule-based learning methods for identifying the disjuncts. We also present a new performance measure called Geometric Small Disjunct Index (GSDI) to quantify the performance on small disjuncts for balanced as well as class imbalanced datasets. Experimental comparison with a variety of state-of-the-art algorithms is carried out using the best classifiers of each type selected by a new approach inspired by multi-criteria decision making. The proposed method is found to outperform the contending state-of-the-art methods on different datasets (ranging from mildly imbalanced to highly imbalanced and characterized by varying number of disjuncts) in terms of three different performance indices (including the proposed GSDI). UPPORT V ector Machines (SVMs) [1] are a family of popular classifiers having elegant mathematical basis that can be used to model both linear and nonlinear (using the kernel trick) decision boundaries. The kernel trick is used to map the data to a higher dimensional feature space in order to facilitate linear separability between classes not linearly separable in the native input space. Shounak Datta, Sankha Subhra Mullick, and Swagatam Das are with the Electronics and Communication Sciences Unit, Indian Statistical Institute, Kolkata, India. Sayak Nag is with the Department of Instrumentation and Electronics Engineering, Jadavpur University, Kolkata, India. While being highly effective for non-overlapping classes, the performance of SVMs suffers in case of overlapping classes, due to the presence of data irregularities such as class imbalance (under-represented classes) [2]-[4] and small disjuncts (under-represented sub-concepts within classes) [5]-[7]. Class imbalanced often results in greater misclassification from the minority class.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Machine Learning

Dec-22-2017

arXiv.org PDF

Add feedback

Country:
- Asia > India > West Bengal > Kolkata (0.44)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Support Vector Machines (1.00)
  - Performance Analysis > Accuracy (0.92)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found