Support Vector Machine Active Learning Algorithms with Query-by-Committee versus Closest-to-Hyperplane Selection
The use of active learning has received a lot of interest for reducing annotation costs for text and speech processing applications [1], [2], [3], [4], [5], [6]. Many applications have the following three characteristics: 1) they have imbalanced data sets, 2) training data annotation is a burden, and 3) support vector machines (SVMs) are able to train highperforming systems for the application. Two examples of such applications are Text Classification (TC) and Relation Extraction (RE). Characteristics 2 and 3 suggest the use of AL-SVM (Active Learning (AL) with Support Vector Machines). Previous work has presented an AL-SVM algorithm that selects (i.e., requests labels for) the examples that are closest to the current model's hyperplane [7], [8], [9], [10]. This "closest"-based algorithm has been shown to need modification for imbalanced data situations [11]. Previous work has presented a method for adapting to imbalanced data situations in the context of AL-SVM by using asymmetric cost factors during model training [11]. The asymmetric cost model has been shown to be most effective when the model is based on prevalence statistics from an unbiased initial sample of data and serves as positive amplification for the minority positive examples.
Jan-24-2018
- Country:
- Asia > South Korea (0.04)
- Europe
- Bulgaria > Sofia City Province
- Sofia (0.04)
- Italy > Trentino-Alto Adige/Südtirol
- Trentino Province > Trento (0.04)
- Slovenia > Upper Carniola
- Municipality of Bled > Bled (0.04)
- Sweden > Uppsala County
- Uppsala (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- Bulgaria > Sofia City Province
- North America
- Canada > Alberta
- United States
- California
- Orange County > Laguna Hills (0.14)
- San Diego County > San Diego (0.04)
- San Francisco County > San Francisco (0.04)
- Colorado > Boulder County
- Boulder (0.04)
- Maryland > Montgomery County
- Bethesda (0.04)
- New Jersey > Mercer County
- Ewing (0.14)
- New York > New York County
- New York City (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- California
- Genre:
- Research Report
- Experimental Study (0.68)
- New Finding (0.68)
- Research Report
- Technology: