Quick-Draw Bandits: Quickly Optimizing in Nonstationary Environments with Extremely Many Arms

Everett, Derek, Lu, Fred, Raff, Edward, Camacho, Fernando, Holt, James

Jun-2-2025–arXiv.org Machine Learning

Canonical algorithms for multi-armed bandits typically assume a stationary reward environment where the size of the action space (number of arms) is small. More recently developed methods typically relax only one of these assumptions: existing non-stationary bandit policies are designed for a small number of arms, while Lipschitz, linear, and Gaussian process bandit policies are designed to handle a large (or infinite) number of arms in stationary reward environments under constraints on the reward function. In this manuscript, we propose a novel policy to learn reward environments over a continuous space using Gaussian interpolation. We show that our method efficiently learns continuous Lipschitz reward functions with $\mathcal{O}^*(\sqrt{T})$ cumulative regret. Furthermore, our method naturally extends to non-stationary problems with a simple modification. We finally demonstrate that our method is computationally favorable (100-10000x faster) and experimentally outperforms sliding Gaussian process policies on datasets with non-stationarity and an extremely large number of arms.

bandit policy, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

Jun-2-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Finland (0.04)
- North America
  - United States
    - Maryland > Baltimore (0.04)
    - Virginia > Fairfax County
      - McLean (0.04)
    - New York > New York County
      - New York City (0.04)
  - Canada > Ontario
    - Toronto (0.05)

Genre:
- Research Report (1.00)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found