OptimalAlgorithmsforStochasticMulti-Armed BanditswithHeavyTailedRewards
–Neural Information Processing Systems
We also find the optimal hyperparameters for each perturbation, which can achieve the minimax optimal regret bound with respect to total rounds.
Neural Information Processing Systems
Feb-8-2026, 14:57:23 GMT
- Country:
- Technology: