Anytime Model Selection in Linear Bandits

Neural Information Processing Systems 

M different samples in parallel from the environment so that the reward for each agent is realized.