Non-Stationary Bandits under Recharging Payoffs: Improved Planning with Sublinear Regret
–Neural Information Processing Systems
The stochastic multi-armed bandit setting has been recently studied in the non-stationary regime, where the mean payoff of each action is a non-decreasing function of the number of rounds passed since it was last played. This model captures natural behavioral aspects of the users which crucially determine the performance of recommendation platforms, ad placement systems, and more.
Neural Information Processing Systems
Aug-22-2025, 00:49:01 GMT
- Country:
- Genre:
- Research Report (0.46)
- Technology: