PRODuctive bandits: Importance Weighting No More

Neural Information Processing Systems 

Prod is a seminal algorithm in full-information online learning, which has been conjectured to be fundamentally sub-optimal for multi-armed bandits.By leveraging the interpretation of Prod as a first-order OMD approximation, we present the following surprising results:1. Variants of Prod can obtain optimal regret for adversarial multi-armed bandits.