Goto

Collaborating Authors

 collapsing bandit


CollapsingBanditsandTheirApplicationtoPublic HealthInterventions

Neural Information Processing Systems

Neither (i) nor (ii) are known for general RMABs. Therefore, to capture the scheduling problems addressed inthiswork,weintroduce anewsubclass ofRMABs,Collapsing Bandits, distinguished by the following feature: when an arm is played, the agent fully observes its state, "collapsing" any uncertainty, but when an arm is passive, no observation is made and uncertainty evolves.





Review for NeurIPS paper: Collapsing Bandits and Their Application to Public Health Intervention

Neural Information Processing Systems

The reviewers are all enthusiastic about the paper, though by varying degrees. The paper's main significant contribution is to the problem of planning for a class of partially observed restless bandits with two arm states each, for which a monotone transition probability structure holds -- the paper argues that this structure is quite natural in several applications, and demonstrates numerical results on one such setting involving medical interventions. It is shown that under this structure, the restless bandit is Whittle-indexable. Although there is no learning component addressed in the paper, the hope is that such a structural characterization will open up avenues for more work on learning good policies when there is a priori uncertainty about the restless Markov decision processes.


Collapsing Bandits and Their Application to Public Health Interventions

Mate, Aditya, Killian, Jackson A., Xu, Haifeng, Perrault, Andrew, Tambe, Milind

arXiv.org Artificial Intelligence

We propose and study Collpasing Bandits, a new restless multi-armed bandit (RMAB) setting in which each arm follows a binary-state Markovian process with a special structure: when an arm is played, the state is fully observed, thus "collapsing" any uncertainty, but when an arm is passive, no observation is made, thus allowing uncertainty to evolve. The goal is to keep as many arms in the "good" state as possible by planning a limited budget of actions per round. Such Collapsing Bandits are natural models for many healthcare domains in which workers must simultaneously monitor patients and deliver interventions in a way that maximizes the health of their patient cohort. Our main contributions are as follows: (i) Building on the Whittle index technique for RMABs, we derive conditions under which the Collapsing Bandits problem is indexable. Our derivation hinges on novel conditions that characterize when the optimal policies may take the form of either "forward" or "reverse" threshold policies. (ii) We exploit the optimality of threshold policies to build fast algorithms for computing the Whittle index, including a closed-form. (iii) We evaluate our algorithm on several data distributions including data from a real-world healthcare task in which a worker must monitor and deliver interventions to maximize their patients' adherence to tuberculosis medication. Our algorithm achieves a 3-order-of-magnitude speedup compared to state-of-the-art RMAB techniques while achieving similar performance.