WhenCombinatorialThompsonSamplingmeets ApproximationRegret

Neural Information Processing Systems 

At each round t N, the agent must select one arm from a fixed set ofn arms, denoted by [n], {1,...,n}, using apolicy, based on the feedback from the previous rounds.