Batched Thompson Sampling

Neural Information Processing Systems 

O (log log(T)) expected batch complexity. This is achieved through a dynamic batching strategy, which uses the agents estimates to adaptively increase the batch duration.