The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition

Neural Information Processing Systems 

When the losses are stochastically generated, [Simchowitz and Jamieson, 2019, Y ang et al., 2021]