Adversarial Blocking Bandits

Mar-19-2025, 03:54:55 GMT–Neural Information Processing Systems

We consider a general adversarial multi-armed blocking bandit setting where each played arm can be blocked (unavailable) for some time periods and the reward per arm is given at each time period adversarially without obeying any distribution. The setting models scenarios of allocating scarce limited supplies (e.g., arms) where the supplies replenish and can be reused only after certain time periods. We first show that, in the optimization setting, when the blocking durations and rewards are known in advance, finding an optimal policy (e.g., determining which arm per round) that maximises the cumulative reward is strongly NP-hard, eliminating the possibility of a fully polynomial-time approximation scheme (FPTAS) for the problem unless P = NP. To complement our result, we show that a greedy algorithm that plays the best available arm at each round provides an approximation guarantee that depends on the blocking durations and the path variance of the rewards. In the bandit setting, when the blocking durations and rewards are not known, we design two algorithms, RGA and RGA-META, for the case of bounded duration an path variation.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Mar-19-2025, 03:54:55 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report (0.34)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Computational Learning Theory (0.49)
    - Representation & Reasoning > Search (0.35)
  - Data Science > Data Mining
    - Big Data (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found