Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits

Apr-24-2026, 20:18:26 GMT–Neural Information Processing Systems

This study aims to develop bandit algorithms that automatically exploit tendencies of certain environments to improve performance, without any prior knowledge regarding the environments. We first propose an algorithm for combinatorial semi-bandits with a hybrid regret bound that includes two main features: a bestof-three-worlds guarantee and multiple data-dependent regret bounds. The former means that the algorithm will work nearly optimally in all environments in an adversarial setting, a stochastic setting, or a stochastic setting with adversarial corruptions. The latter implies that, even if the environment is far from exhibiting stochastic behavior, the algorithm will perform better as long as the environment is "easy" in terms of certain metrics. The metrics w.r.t. the easiness referred to in this paper include cumulative loss for optimal actions, total quadratic variation of losses, and path-length of a loss sequence. We also show hybrid data-dependent regret bounds for adversarial linear bandits, which include a first path-length regret bound that is tight up to logarithmic factors.

artificial intelligence, data mining, machine learning, (16 more...)

Neural Information Processing Systems

Apr-24-2026, 20:18:26 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (0.69)

Duplicate Docs Excel Report

Title
HybridRegretBoundsforCombinatorial Semi-BanditsandAdversarialLinearBandits

Similar Docs Excel Report more

Title	Similarity	Source
None found