Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting

Krishnamurthy, Akshay, Langford, John, Slivkins, Aleksandrs, Zhang, Chicheng

Feb-4-2019–arXiv.org Machine Learning

We consider contextual bandits: a setting in which a learner repeatedly makes an action on the basis of contextual information and observes a loss for the action, with the goal of minimizing cumulative loss over a series of rounds. Contextual bandit learning has received much attention, and has seen substantial success in practice (e.g., Auer et al., 2002; Langford and Zhang, 2007; Agarwal et al., 2014, 2017). This line of work mostly considers small, finite action sets, yet in many real-world problems actions are chosen from from an interval, so the set is continuous and infinite. How can we learn to make actions from continuous spaces based on loss-only feedback? We could assume that nearby actions have similar losses, for example that the losses are Lipschitz continuous as a function of the action (following Agrawal, 1995, and a long line of subsequent work).

algorithm, artificial intelligence, big data, (20 more...)

arXiv.org Machine Learning

Feb-4-2019

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.50)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found