On the Relationship between Data Efficiency and Error for Uncertainty Sampling
Mussmann, Stephen, Liang, Percy
While active learning offers potential cost savings, the actual data efficiency---the reduction in amount of labeled data needed to obtain the same error rate---observed in practice is mixed. This paper poses a basic question: when is active learning actually helpful? We provide an answer for logistic regression with the popular active learning algorithm, uncertainty sampling. Empirically, on 21 datasets from OpenML, we find a strong inverse correlation between data efficiency and the error rate of the final classifier. Theoretically, we show that for a variant of uncertainty sampling, the asymptotic data efficiency is within a constant factor of the inverse error rate of the limiting classifier.
Jun-15-2018
- Country:
- North America > United States
- New York (0.04)
- California > Santa Clara County
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- United Kingdom > England
- North America > United States
- Genre:
- Research Report > New Finding (0.67)
- Technology: