Reviews: Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications
–Neural Information Processing Systems
Overview: The paper studies combinatorial multi-armed bandit with probabilistically triggered arms and semi-bandit feedback (CMAB-T). The paper improves the existing regret bounds by introducing a new version of the bounded smoothness condition on the reward functions, called triggering probability modulated bounded smoothness condition. The intuition behind this is that hard-to-trigger arms do not contribute much to the expected regret. With this new condition, the authors are able to remove a potentially exponentially large factor in the existing regret bounds. The paper then shows that applications such as influence maximization bandit and combinatorial cascading bandit both satisfy this condition.
Neural Information Processing Systems
Oct-8-2024, 06:12:01 GMT
- Technology: