Non-Stationary Bandits under Recharging Payoffs: Improved Planning with Sublinear Regret

Neural Information Processing Systems 

The stochastic multi-armed bandit setting has been recently studied in the non-stationary regime, where the mean payoff of each action is a non-decreasing function of the number of rounds passed since it was last played. This model captures natural behavioral aspects of the users which crucially determine the performance of recommendation platforms, ad placement systems, and more.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found