Reviews: Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications

Neural Information Processing Systems 

Overview: The paper studies combinatorial multi-armed bandit with probabilistically triggered arms and semi-bandit feedback (CMAB-T). The paper improves the existing regret bounds by introducing a new version of the bounded smoothness condition on the reward functions, called triggering probability modulated bounded smoothness condition. The intuition behind this is that hard-to-trigger arms do not contribute much to the expected regret. With this new condition, the authors are able to remove a potentially exponentially large factor in the existing regret bounds. The paper then shows that applications such as influence maximization bandit and combinatorial cascading bandit both satisfy this condition.