Non-Stationary Bandits under Recharging Payoffs: Improved Planning with Sublinear Regret

Open in new window