Oracle-Efficient Combinatorial Semi-Bandits

Kim, Jung-hun, Vojnović, Milan, Oh, Min-hwan

Oct-27-2025–arXiv.org Machine Learning

We study the combinatorial semi-bandit problem where an agent selects a subset of base arms and receives individual feedback. While this generalizes the classical multi-armed bandit and has broad applicability, its scalability is limited by the high cost of combinatorial optimization, requiring oracle queries at every round. To tackle this, we propose oracle-efficient frameworks that significantly reduce oracle calls while maintaining tight regret guarantees. For the worst-case linear reward setting, our algorithms achieve $\tilde{O}(\sqrt{T})$ regret using only $O(\log\log T)$ oracle queries. We also propose covariance-adaptive algorithms that leverage noise structure for improved regret, and extend our approach to general (non-linear) rewards. Overall, our methods reduce oracle usage from linear to (doubly) logarithmic in time, with strong theoretical guarantees.

data mining, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

Oct-27-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.04)
- Europe
  - United Kingdom (0.04)
  - France (0.04)
- Asia > South Korea
  - Seoul > Seoul (0.04)

Genre:
- Research Report > Experimental Study (1.00)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.86)
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Representation & Reasoning (0.88)
    - Natural Language (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found