DesignofExperiments forStochasticContextualLinearBandits

Neural Information Processing Systems 

In the stochastic linear contextual bandit setting there exist several minimax procedures for exploration with policies that are reactive to the data being acquired.