On the Minimax Regret for Online Learning with Feedback Graphs

Feb-11-2025, 04:45:17 GMT–Neural Information Processing Systems

In this work, we improve on the upper and lower bounds for the regret of online learning with strongly observable undirected feedback graphs. The best known upper bound for this problem is \mathcal{O}\bigl(\sqrt{\alpha T\ln K}\bigr), where K is the number of actions, \alpha is the independence number of the graph, and T is the time horizon. The \sqrt{\ln K} factor is known to be necessary when \alpha 1 (the experts case). On the other hand, when \alpha K (the bandits case), the minimax rate is known to be \Theta\bigl(\sqrt{KT}\bigr), and a lower bound \Omega\bigl(\sqrt{\alpha T}\bigr) is known to hold for any \alpha . Our improved upper bound \mathcal{O}\bigl(\sqrt{\alpha T(1 \ln(K/\alpha))}\bigr) holds for any \alpha and matches the lower bounds for bandits and experts, while interpolating intermediate cases.

feedback graph, online learning, sqrt, (7 more...)

Neural Information Processing Systems

Feb-11-2025, 04:45:17 GMT

Conferences Web Page

Add feedback

Industry:
- Education > Educational Setting > Online (0.64)

Technology:
- Information Technology
  - Artificial Intelligence > Representation & Reasoning
    - Search (0.64)
  - Enterprise Applications > Human Resources
    - Learning Management (0.64)