Policy Gradient for Rectangular Robust Markov Decision Processes

Neural Information Processing Systems 

However, they do not account for transition uncertainty, whereas learning robust policies can be computationally expensive.