On the Relationship between Data Efficiency and Error for Uncertainty Sampling

Jun-15-2018–arXiv.org Machine Learning

While active learning offers potential cost savings, the actual data efficiency---the reduction in amount of labeled data needed to obtain the same error rate---observed in practice is mixed. This paper poses a basic question: when is active learning actually helpful? We provide an answer for logistic regression with the popular active learning algorithm, uncertainty sampling. Empirically, on 21 datasets from OpenML, we find a strong inverse correlation between data efficiency and the error rate of the final classifier. Theoretically, we show that for a variant of uncertainty sampling, the asymptotic data efficiency is within a constant factor of the inverse error rate of the limiting classifier.

artificial intelligence, data efficiency and error, machine learning, (11 more...)

arXiv.org Machine Learning

Jun-15-2018

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
  - California > Santa Clara County
    - Stanford (0.04)
    - Palo Alto (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)

Genre:
- Research Report > New Finding (0.67)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (1.00)
  - Performance Analysis > Accuracy (0.88)
  - Learning Graphical Models > Directed Networks
    - Bayesian Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found