OptimalAlgorithmsforStochasticMulti-Armed BanditswithHeavyTailedRewards

Neural Information Processing Systems 

We also find the optimal hyperparameters for each perturbation, which can achieve the minimax optimal regret bound with respect to total rounds.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found