Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback

Zhang, Chicheng, Agarwal, Alekh, Daumé, Hal III, Langford, John, Negahban, Sahand N

Jan-2-2019–arXiv.org Machine Learning

We investigate the feasibility of learning from both fully-labeled supervised data and contextual bandit data. We specifically consider settings in which the underlying learning signal may be different between these two data sources. Theoretically, we state and prove no-regret algorithms for learning that is robust to divergences between the two sources. Empirically, we evaluate some of these algorithms on a large selection of datasets, showing that our approaches are feasible, and helpful in practice.

algorithm, artificial intelligence, machine translation, (15 more...)

arXiv.org Machine Learning

Jan-2-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States > Virginia (0.14)

Genre:
- Research Report (0.81)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning (0.46)
  - Natural Language > Machine Translation (0.67)
  - Representation & Reasoning (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found