Model Selection in Contextual Stochastic Bandit Problems

Dec-24-2025, 04:52:20 GMT–Neural Information Processing Systems

We study bandit model selection in stochastic environments. Our approach relies on a master algorithm that selects between candidate base algorithms. We develop a master-base algorithm abstraction that can work with general classes of base algorithms and different type of adversarial master algorithms. Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal $O(\sqrt{T})$ model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee. We show through a lower bound that even when one of the base algorithms has $O(\log T)$ regret, in general it is impossible to get better than $\Omega(\sqrt{T})$ regret in model selection, even asymptotically.

algorithm, contextual stochastic bandit problem, model selection, (9 more...)

Neural Information Processing Systems

Dec-24-2025, 04:52:20 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (0.63)