Reviews: Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes
–Neural Information Processing Systems
The authors consider distributionally robust finite MDPs over a finite horizon. The transition probabilities conditionally to a state-action pair should remain at L1-bounded distance from a base measure, which is feasible as being generated using a given reference policy. This is a nice idea. A few comments are mentioned next. Related to that question, why the requirement of staying "close" to this policy would be beneficial.
Neural Information Processing Systems
Oct-7-2024, 15:21:05 GMT
- Technology: