Combinatorial Pure Exploration of Multi-Armed Bandits

Shouyuan Chen, Tian Lin, Irwin King, Michael R. Lyu, Wei Chen

Neural Information Processing Systems 

Multi-armed bandit (MAB) is a predominant model for characterizing the tradeoff between exploration and exploitation in decision-making problems. Although this is an intrinsic tradeoff in many tasks, some application domains prefer a dedicated exploration procedure in which the goal is to identify an optimal object among a collection of candidates and the reward or loss incurred during exploration is irrelevant. In light of these applications, the related learning problem, called pure exploration in MABs, has received much attention. Recent advances in pure exploration MABs have found potential applications in many domains including crowdsourcing, communication network and online advertising. In many of these application domains, a recurring problem is to identify the optimal object with certain combinatorial structure.