Exploration-exploitation trade-off for continuous-time episodic reinforcement learning with linear-convex models

Open in new window