The Pareto Frontier of model selection for general Contextual Bandits

Jan-17-2025, 17:14:29 GMT–Neural Information Processing Systems

Recent progress in model selection raises the question of the fundamental limits of these techniques. Under specific scrutiny has been model selection for general contextual bandits with nested policy classes, resulting in a COLT2020 open problem. It asks whether it is possible to obtain simultaneously the optimal single algorithm guarantees over all policies in a nested sequence of policy classes, or if otherwise this is possible for a trade-off \alpha\in[\frac{1}{2},1) between complexity term and time: \ln( \Pi_m) {1-\alpha}T \alpha . We give a disappointing answer to this question. Even in the purely stochastic regime, the desired results are unobtainable.

general contextual bandit, model selection, pareto frontier, (2 more...)

Neural Information Processing Systems

Jan-17-2025, 17:14:29 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.91)