Cost Effective Active Search

Jiang, Shali, Garnett, Roman, Moseley, Benjamin

Neural Information Processing Systems 

We study a special paradigm of active learning, called cost effective active search, where the goal is to find a given number of positive points from a large unlabeled pool with minimum labeling cost. Most existing methods solve this problem heuristically, and few theoretical results have been established. We adopt a principled Bayesian approach for the first time. We first derive the Bayesian optimal policy and establish a strong hardness result: the optimal policy is hard to approximate, with the best-possible approximation ratio lower bounded by $\Omega(n {0.16})$. We then propose an efficient and nonmyopic policy using the negative Poisson binomial distribution.