Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget

Neural Information Processing Systems 

After each decision to choose a particular arm, the learner receives some form of feedback - typically a numerical reward - determined by a feedback mechanism of the chosen arm.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found