Almost Minimax Optimal Best Arm Identification in Piecewise Stationary Linear Bandits
–Neural Information Processing Systems
We propose a novel piecewise stationary linear bandit (PSLB) model, where the environment randomly samples a context from an unknown probability distribution at each changepoint, and the quality of an arm is measured by its return averaged over all contexts. The contexts and their distribution, as well as the changepoints are unknown to the agent.
Neural Information Processing Systems
Mar-27-2025, 13:18:31 GMT
- Genre:
- Research Report > Experimental Study (0.92)
- Industry:
- Technology: