Taming Heavy-Tailed Losses in Adversarial Bandits and the Best-of-Both-Worlds Setting

Neural Information Processing Systems 

Consider the multi-armed bandits (MAB) problem (Auer et al., 2002a,b), which is a useful framework Typically, the losses are assumed to have a support on a bounded interval (e.g., Moreover, while the former ones enjoy a logarithmic regret (i.e., These performance discrepancies motivated the study of the Best-of-Both-W orlds (BOBW) setting.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found