Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards
–Neural Information Processing Systems
Then, the goal of the agent is to maximize cumulative rewards over time by identifying an optimal action which has the maximum reward. However, since MABs often assume that prior knowledge about rewards is not given, the agent faces an innate dilemma between gathering new information by exploring sub-optimal actions (exploration) and choosing the best action based on the collected information (exploitation). Designing an efficient exploration algorithm for MABs is a long-standing challenging problem.
Neural Information Processing Systems
Oct-3-2025, 01:16:31 GMT
- Country:
- Asia > South Korea
- North America > Canada (0.04)
- Genre:
- Research Report (0.68)
- Industry:
- Education > Educational Setting > Online (0.46)
- Technology: