Non

Neural Information Processing Systems 

The stochastic multi-armed bandit setting has been recently studied in the nonstationary regime, where the mean payoff of each action is a non-decreasing function of the number of rounds passed since it was last played. This model captures natural behavioral aspects of the users which crucially determine the performance of recommendation platforms, ad placement systems, and more.