A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs

Jan-19-2025, 03:27:38 GMT–Neural Information Processing Systems

We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally-efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments. The bound against oblivious adversaries is \tilde{O} (\sqrt{\alpha T}), where T is the time horizon and \alpha is the independence number of the feedback graph. The bound against stochastic environments is O\big((\ln T) 2 \max_{S\in \mathcal I(G)} \sum_{i \in S} \Delta_i {-1}\big) where \mathcal I(G) is the family of all independent sets in a suitably defined undirected version of the graph and \Delta_i are the suboptimality gaps.The algorithm combines ideas from the EXP3 algorithm for stochastic and adversarial bandits and the EXP3.G algorithm for feedback graphs with a novel exploration scheme. The scheme, which exploits the structure of the graph to reduce exploration, is key to obtain best-of-both-worlds guarantees with feedback graphs.

feedback graph, near-optimal best-of-both-world algorithm, online learning, (1 more...)

Neural Information Processing Systems

Jan-19-2025, 03:27:38 GMT

Conferences Web Page

Add feedback

Industry:
- Education > Educational Setting > Online (0.64)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (0.81)
  - Enterprise Applications > Human Resources
    - Learning Management (0.64)