PRODuctive bandits: Importance Weighting No More

Neural Information Processing Systems 

Prod is a seminal algorithm in full-information online learning, which has been conjectured to be fundamentally sub-optimal for multi-armed bandits.