Efficient learning by implicit exploration in bandit problems with side observations

Jan-17-2025, 12:52:01 GMT–Neural Information Processing Systems

We consider online learning problems under a a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that in addition to its own loss, the learner also gets to observe losses of some other actions. The revealed losses depend on the learner's action and a directed observation system chosen by the environment. For this setting, we propose the first algorithm that enjoys near-optimal regret guarantees without having to know the observation system before selecting its actions. Along similar lines, we also define a new partial information setting that models online combinatorial optimization problems where the feedback received by the learner is between semi-bandit and full feedback.

bandit problem, implicit exploration, side observation, (5 more...)

Neural Information Processing Systems

Jan-17-2025, 12:52:01 GMT

Conferences Web Page

Add feedback

Industry:
- Education (0.63)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.40)
  - Artificial Intelligence
    - Representation & Reasoning (0.63)
    - Machine Learning (0.43)