On the Minimax Regret for Online Learning with Feedback Graphs
–Neural Information Processing Systems
In this work, we improve on the upper and lower bounds for the regret of online learning with strongly observable undirected feedback graphs. The best known upper bound for this problem is \mathcal{O}\bigl(\sqrt{\alpha T\ln K}\bigr), where K is the number of actions, \alpha is the independence number of the graph, and T is the time horizon. The \sqrt{\ln K} factor is known to be necessary when \alpha 1 (the experts case). On the other hand, when \alpha K (the bandits case), the minimax rate is known to be \Theta\bigl(\sqrt{KT}\bigr), and a lower bound \Omega\bigl(\sqrt{\alpha T}\bigr) is known to hold for any \alpha . Our improved upper bound \mathcal{O}\bigl(\sqrt{\alpha T(1 \ln(K/\alpha))}\bigr) holds for any \alpha and matches the lower bounds for bandits and experts, while interpolating intermediate cases.
Neural Information Processing Systems
Feb-11-2025, 04:45:17 GMT