Note that the regret ofthe algorithm in [1]satisfiesR(G,T) = O (δ(G)logn)

Neural Information Processing Systems 

The bandit problem with graph feedback, proposed in [Mannor and Shamir, NeurIPS 2011], is modeled by a directed graphG = (V,E) where V is the collection of bandit arms, and once an arm is triggered, all its incident arms are observed.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found