Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints

Daulton, Samuel, Singh, Shaun, Avadhanula, Vashist, Dimmery, Drew, Bakshy, Eytan

Nov-1-2019–arXiv.org Artificial Intelligence

Recent advances in contextual bandit optimization and reinforcement learning have garnered interest in applying these methods to real-world sequential decision making problems. Real-world applications frequently have constraints with respect to a currently deployed policy. Many of the existing constraint-aware algorithms consider problems with a single objective (the reward) and a constraint on the reward with respect to a baseline policy. However, many important applications involve multiple competing objectives and auxiliary constraints. In this paper, we propose a novel Thompson sampling algorithm for multi-outcome contextual bandit problems with auxiliary constraints. We empirically evaluate our algorithm on a synthetic problem. Lastly, we apply our method to a real world video transcoding problem and provide a practical way for navigating the trade-off between safety and performance using Bayesian optimization.

algorithm, constraint, safety constraint, (12 more...)

arXiv.org Artificial Intelligence

Nov-1-2019

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada (0.04)
  - United States > California
    - San Mateo County > Menlo Park (0.04)

Genre:
- Research Report (0.50)

Industry:
- Health & Medicine (0.47)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.73)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning
      - Reinforcement Learning (0.54)
      - Statistical Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found