Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting

Krishnamurthy, Akshay, Langford, John, Slivkins, Aleksandrs, Zhang, Chicheng

arXiv.org Machine Learning 

We consider contextual bandits: a setting in which a learner repeatedly makes an action on the basis of contextual information and observes a loss for the action, with the goal of minimizing cumulative loss over a series of rounds. Contextual bandit learning has received much attention, and has seen substantial success in practice (e.g., Auer et al., 2002; Langford and Zhang, 2007; Agarwal et al., 2014, 2017). This line of work mostly considers small, finite action sets, yet in many real-world problems actions are chosen from from an interval, so the set is continuous and infinite. How can we learn to make actions from continuous spaces based on loss-only feedback? We could assume that nearby actions have similar losses, for example that the losses are Lipschitz continuous as a function of the action (following Agrawal, 1995, and a long line of subsequent work).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found