Policy Gradient for Rectangular Robust Markov Decision Processes Anonymous Author(s) Affiliation Address email

Neural Information Processing Systems 

We provide a closed-form expression for the worst occupation measure.