Support Vector Machine Active Learning Algorithms with Query-by-Committee versus Closest-to-Hyperplane Selection

Bloodgood, Michael

arXiv.org Machine Learning 

The use of active learning has received a lot of interest for reducing annotation costs for text and speech processing applications [1], [2], [3], [4], [5], [6]. Many applications have the following three characteristics: 1) they have imbalanced data sets, 2) training data annotation is a burden, and 3) support vector machines (SVMs) are able to train highperforming systems for the application. Two examples of such applications are Text Classification (TC) and Relation Extraction (RE). Characteristics 2 and 3 suggest the use of AL-SVM (Active Learning (AL) with Support Vector Machines). Previous work has presented an AL-SVM algorithm that selects (i.e., requests labels for) the examples that are closest to the current model's hyperplane [7], [8], [9], [10]. This "closest"-based algorithm has been shown to need modification for imbalanced data situations [11]. Previous work has presented a method for adapting to imbalanced data situations in the context of AL-SVM by using asymmetric cost factors during model training [11]. The asymmetric cost model has been shown to be most effective when the model is based on prevalence statistics from an unbiased initial sample of data and serves as positive amplification for the minority positive examples.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found