Stopping Active Learning based on Predicted Change of F Measure for Text Classification

Jan-25-2019–arXiv.org Machine Learning

Abstract--During active learning, an effective stopping method allows users to limit the number of annotations, which is cost effective. In this paper, a new stopping method called Predicted Change of F Measure will be introduced that attempts to provide the users an estimate of how much performance of the model is changing at each iteration. This stopping method can be applied with any base learner. This method is useful for reducing the data annotation bottleneck encountered when building text classification systems. I. INTRODUCTION The use of active learning to train machine learning models has been used as a way to reduce annotation costs for text and speech processing applications [1], [2], [3], [4], [5]. Active learning has been shown to have a particularly large potential for reducing annotation cost for text classification [6], [7]. Text classification is one of the most important fields in semantic computing and it has been used in many applications [8], [9], [10], [11], [12]. A. Active Learning Active learning is a form of machine learning that gives the model the ability to select the data on which it wants to learn from and to choose when to end the process of training. In active learning, the model is first provided a small batch of annotated data to be trained on. Then, in each following iteration, the model selects a small batch and removes this batch from a large unlabeled set of examples.

active learning, annotation, proceedings, (14 more...)

arXiv.org Machine Learning

Jan-25-2019

arXiv.org PDF

Add feedback

Country:
- Asia > South Korea (0.04)
- North America > United States
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.04)
  - New York > New York County
    - New York City (0.04)
  - New Jersey > Mercer County
    - Ewing (0.14)
  - Maryland > Montgomery County
    - Bethesda (0.04)
  - Colorado > Boulder County
    - Boulder (0.04)
  - California
    - San Diego County > San Diego (0.04)
    - San Francisco County > San Francisco (0.04)
    - Orange County
      - Newport Beach (0.04)
      - Laguna Hills (0.04)
- Europe
  - Czechia > Prague (0.04)
  - Sweden > Uppsala County
    - Uppsala (0.04)
  - Bulgaria > Sofia City Province
    - Sofia (0.04)

Genre:
- Research Report > Experimental Study (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Classification (1.00)
  - Machine Learning > Statistical Learning
    - Support Vector Machines (0.30)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found