Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications

Qinshi Wang, Wei Chen

Neural Information Processing Systems 

The goal of the player is to cumulate as much reward as possible over time.