Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting Jorge A. Mendez

Neural Information Processing Systems 

Policy gradient methods have shown success in learning control policies for high-dimensional dynamical systems. Their biggest downside is the amount of exploration they require before yielding high-performing policies.