Optimistic Q-learning for average reward and episodic reinforcement learning

Open in new window