Reviews: Cost Effective Active Search

Neural Information Processing Systems 

The paper considers a Bayesian decision theoretic formulation of the problem of minimizing the number of queries to identify the desired number of positive instances (instances with positive labels), given a probabilistic model of the labels in the dataset. This formulation is motivated by the material and drug discovery problems. The problem is properly formulated and contrasted with the recently suggested budgeted-learning setting, where the goal is to identify the largest number of positive instances given a fixed budget on queries. Further the authors show that the optimal Bayesian policy is hard to compute and hard to approximate. However, further assuming certain conditional independence the policy can be approximated efficiently using the negative-poisson-binomial distribution, for which the authors propose computationally-cheap expectation estimates.The resulting policy is compared to several other alternatives, and it is shown to obtain overall superior performance in both material discovery and drug discovery datasets.