Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

Neural Information Processing Systems 

We study the problem of learning episodic Markov Decision Processes (MDPs).