Polynomial Regret Concentration of UCB for Non-Deterministic State Transitions

Open in new window