Contextual Semibandits via Supervised Learning Oracles

Krishnamurthy, Akshay, Agarwal, Alekh, Dudik, Miroslav

Nov-4-2016–arXiv.org Machine Learning

Decision making with partial feedback, motivated by applications including personalized medicine [22] and content recommendation [17], is receiving increasing attention from the machine learning community. These problems are formally modeled as learning from bandit feedback, where a learner repeatedly takes an action and observes a reward for the action, with the goal of maximizing reward. While bandit learning captures many problems of interest, several applications have additional structure: the action is combinatorial in nature and more detailed feedback is provided. For example, in internet applications, we often recommend sets of items and record information about the user's interaction with each individual item (e.g., click). This additional feedback is unhelpful unless it relates to the overall reward (e.g., number of clicks), and, as in previous work, we assume a linear relationship. This interaction is known as the semibandit feedback model. Typical bandit and semibandit algorithms achieve reward that is competitive with the single best fixed action, i.e., the best medical treatment or the most popular news article for everyone. This is often inadequate for recommendation applications: while the most popular articles may get some clicks, personalizing content to the users is much more effective.

artificial intelligence, inductive learning, machine learning, (20 more...)

arXiv.org Machine Learning

Nov-4-2016

arXiv.org PDF

Add feedback

Country:
- North America > United States > Massachusetts (0.27)

Genre:
- Research Report > New Finding (0.67)

Industry:
- Health & Medicine (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Inductive Learning (0.64)
  - Statistical Learning > Regression (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found