Thompson Sampling Achieves $\tilde O(\sqrt{T})$ Regret in Linear Quadratic Control

Open in new window