Goto

Collaborating Authors

 Reinforcement Learning



Inference for Batched Bandits

Neural Information Processing Systems

However, for many real-world problems it is not enough to just minimize regret on a particular problem instance. For example, suppose we have run an online education experiment using a bandit algorithm where we test different types of teaching strategies.