Combinatorial Multi-Armed Bandit with General Reward Functions

Wei Chen, Wei Hu, Fu Li, Jian Li, Yu Liu, Pinyan Lu

Neural Information Processing Systems 

In this paper, unless otherwise specified, we use MAB to refer to stochastic MAB. MAB problem demonstrates the key tradeoff between exploration and exploitation: whether the player should stick to the choice that performs the best so far, or should try some less explored alternatives that may provide better rewards.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found