Model Selection for Contextual Bandits

Dylan J. Foster, Akshay Krishnamurthy, Haipeng Luo

Neural Information Processing Systems 

We introduce the problem of model selection for contextual bandits, where a learner must adapt to the complexity of the optimal policy while balancing exploration and exploitation. Our main result is a new model selection guarantee for linear contextual bandits.