Polynomial Regret Concentration of UCB for Non-Deterministic State Transitions