Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards

Neural Information Processing Systems 

Then, the goal of the agent is to maximize cumulative rewards over time by identifying an optimal action which has the maximum reward. However, since MABs often assume that prior knowledge about rewards is not given, the agent faces an innate dilemma between gathering new information by exploring sub-optimal actions (exploration) and choosing the best action based on the collected information (exploitation). Designing an efficient exploration algorithm for MABs is a long-standing challenging problem.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found