Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting

Neural Information Processing Systems 

We study reinforcement learning in non-episodic factored Markov decision processes (FMDPs).