CategorizedBandits

Neural Information Processing Systems 

In the multi-armed bandit problem, an agent has several possible decisions, usually referred to as "arms", and chooses or "pulls" sequentially one of them at each time step. This generates a sequence of rewards and the objective is to maximize their cumulative sum.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found