Budget-Constrained Bandits over General Cost and Reward Distributions

Cayci, Semih, Eryilmaz, Atilla, Srikant, R.

Feb-29-2020–arXiv.org Machine Learning

We consider a budget-constrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return. The objective is to maximize the total expected reward under a budget constraint on the total cost. The model is general in the sense that it allows correlated and potentially heavy-tailed cost-reward pairs that can take on negative values as required by many applications. We show that if moments of order $(2+\gamma)$ for some $\gamma > 0$ exist for all cost-reward pairs, $O(\log B)$ regret is achievable for a budget $B>0$. In order to achieve tight regret bounds, we propose algorithms that exploit the correlation between the cost and reward of each arm by extracting the common information via linear minimum mean-square error estimation. We prove a regret lower bound for this problem, and show that the proposed algorithms achieve tight problem-dependent regret bounds, which are optimal up to a universal constant factor in the case of jointly Gaussian cost and reward pairs.

algorithm, budget-constrained bandit, inequality, (13 more...)

arXiv.org Machine Learning

Feb-29-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Ohio (0.04)
- Europe
  - Italy (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found