PAC-Bayesian Analysis of Contextual Bandits

Feb-16-2024, 08:22:45 GMT–Neural Information Processing Systems

We derive an instantaneous (per-round) data-dependent regret bound for stochastic multiarmed bandits with side information (also known as contextual bandits). The scaling of our regret bound with the number of states (contexts) N goes as \sqrt{N I_{\rho_t}(S;A)}, where I_{\rho_t}(S;A) is the mutual information between states and actions (the side information) used by the algorithm at round t . If the algorithm uses all the side information, the regret bound scales as \sqrt{N \ln K}, where K is the number of actions (arms). However, if the side information I_{\rho_t}(S;A) is not fully used, the regret bound is significantly tighter. In the extreme case, when I_{\rho_t}(S;A) 0, the dependence on the number of states reduces from linear to logarithmic.

contextual bandit, pac-bayesian analysis, side information, (3 more...)

Neural Information Processing Systems

Feb-16-2024, 08:22:45 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)