Reviews: Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems

Neural Information Processing Systems 

The restrictive assumption of restarting (which also significantly simplifies the regret analysis) was not mentioned. Note that the work by Liu, et.