Goto

Collaborating Authors

 batch active search


Efficient nonmyopic batch active search

Neural Information Processing Systems

Active search is a learning paradigm for actively identifying as many members of a given class as possible. A critical target scenario is high-throughput screening for scientific discovery, such as drug or materials discovery. In these settings, specialized instruments can often evaluate \emph{multiple} points simultaneously; however, all existing work on active search focuses on sequential acquisition.



Reviews: Efficient nonmyopic batch active search

Neural Information Processing Systems

This work investigates the different batch-mode extensions of an active search method called efficient nonmyopic policy (ENS) [12]. ENS achieves good performance efficiently because it assumes the sample selections are independent after a step [12]. This paper proposes two strategies: 1) converting the batch active search problem to sequential one by guessing the hidden labels of selected samples 2) try to enumerate all possible hidden labels of selected samples by Monte Carlo. Strength: Allowing to sample batches is important in practice. The work addresses several theoretical and practical challenges in many aspects of batch active search such as how difficult batch active search could be, why pessimistic oracle works well, and how to make the methods more efficient by pruning.


Efficient nonmyopic batch active search

Jiang, Shali, Malkomes, Gustavo, Abbott, Matthew, Moseley, Benjamin, Garnett, Roman

Neural Information Processing Systems

Active search is a learning paradigm for actively identifying as many members of a given class as possible. A critical target scenario is high-throughput screening for scientific discovery, such as drug or materials discovery. In these settings, specialized instruments can often evaluate \emph{multiple} points simultaneously; however, all existing work on active search focuses on sequential acquisition. We first derive the Bayesian optimal policy for this problem, then prove a lower bound on the performance gap between sequential and batch optimal policies: the cost of parallelization.'' We also propose novel, efficient batch policies inspired by state-of-the-art sequential policies, and develop an aggressive pruning technique that can dramatically speed up computation.


Efficient nonmyopic batch active search

Jiang, Shali, Malkomes, Gustavo, Abbott, Matthew, Moseley, Benjamin, Garnett, Roman

Neural Information Processing Systems

Active search is a learning paradigm for actively identifying as many members of a given class as possible. A critical target scenario is high-throughput screening for scientific discovery, such as drug or materials discovery. In these settings, specialized instruments can often evaluate \emph{multiple} points simultaneously; however, all existing work on active search focuses on sequential acquisition. We bridge this gap, addressing batch active search from both the theoretical and practical perspective. We first derive the Bayesian optimal policy for this problem, then prove a lower bound on the performance gap between sequential and batch optimal policies: the ``cost of parallelization.'' We also propose novel, efficient batch policies inspired by state-of-the-art sequential policies, and develop an aggressive pruning technique that can dramatically speed up computation. We conduct thorough experiments on data from three application domains: a citation network, material science, and drug discovery, testing all proposed policies (14 total) with a wide range of batch sizes. Our results demonstrate that the empirical performance gap matches our theoretical bound, that nonmyopic policies usually significantly outperform myopic alternatives, and that diversity is an important consideration for batch policy design.


Efficient nonmyopic batch active search

Jiang, Shali, Malkomes, Gustavo, Abbott, Matthew, Moseley, Benjamin, Garnett, Roman

Neural Information Processing Systems

Active search is a learning paradigm for actively identifying as many members of a given class as possible. A critical target scenario is high-throughput screening for scientific discovery, such as drug or materials discovery. In these settings, specialized instruments can often evaluate \emph{multiple} points simultaneously; however, all existing work on active search focuses on sequential acquisition. We bridge this gap, addressing batch active search from both the theoretical and practical perspective. We first derive the Bayesian optimal policy for this problem, then prove a lower bound on the performance gap between sequential and batch optimal policies: the ``cost of parallelization.'' We also propose novel, efficient batch policies inspired by state-of-the-art sequential policies, and develop an aggressive pruning technique that can dramatically speed up computation. We conduct thorough experiments on data from three application domains: a citation network, material science, and drug discovery, testing all proposed policies (14 total) with a wide range of batch sizes. Our results demonstrate that the empirical performance gap matches our theoretical bound, that nonmyopic policies usually significantly outperform myopic alternatives, and that diversity is an important consideration for batch policy design.