Crush Optimism with Pessimism: Structured Bandits Beyond Asymptotic Optimality

May-29-2025, 05:02:23 GMT–Neural Information Processing Systems

In this paper, we study stochastic structured bandits for minimizing regret. The fact that the popular optimistic algorithms do not achieve the asymptotic instancedependent regret optimality (asymptotic optimality for short) has recently allured researchers. On the other hand, it is known that one can achieve a bounded regret (i.e., does not grow indefinitely with n) in certain instances. Unfortunately, existing asymptotically optimal algorithms rely on forced sampling that introduces an ω(1) term w.r.t. the time horizon n in their regret, failing to adapt to the "easiness" of the instance. In this paper, we focus on the finite hypothesis class and ask if one can achieve the asymptotic optimality while enjoying bounded regret whenever possible. We provide a positive answer by introducing a new algorithm called CRush Optimism with Pessimism (CROP) that eliminates optimistic hypotheses by pulling the informative arms indicated by a pessimistic hypothesis.

artificial intelligence, data mining, machine learning, (16 more...)

Neural Information Processing Systems

May-29-2025, 05:02:23 GMT

Conferences PDF

Add feedback

Country:
- Europe > Hungary (0.14)
- North America > Canada (0.14)

Genre:
- Research Report (0.45)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Representation & Reasoning (0.93)
  - Data Science > Data Mining
    - Big Data (0.47)