Non-stationary Bandits with Knapsacks

Oct-11-2024, 10:58:21 GMT–Neural Information Processing Systems

In this paper, we study the problem of bandits with knapsacks (BwK) in a non-stationary environment. The BwK problem generalizes the multi-arm bandit (MAB) problem to model the resource consumption associated with playing each arm. At each time, the decision maker/player chooses to play an arm, and s/he will receive a reward and consume certain amount of resource from each of the multiple resource types. The objective is to maximize the cumulative reward over a finite horizon subject to some knapsack constraints on the resources. Existing works study the BwK problem under either a stochastic or adversarial environment.

non-stationarity measure, non-stationary bandit, non-stationary environment, (3 more...)

Neural Information Processing Systems

Oct-11-2024, 10:58:21 GMT

Conferences Web Page

Add feedback

Genre:
- Play (0.42)

Technology:
- Information Technology > Artificial Intelligence (0.40)