Logarithmic regret for episodic continuous-time linear-quadratic reinforcement learning over a finite-time horizon

Open in new window