In this paper, we are particularly interested in understanding whether, and how, the choice of distributional robustness bears statistical implications in learning the desired policy, by studying the sample complexity in the widely-used generative model (Kearns and Singh, 1999).
Through extensive empirical analysis using benchmarks in the Safety Gym suite, we show that our algorithm has similar or better performance than SoT A (non-episodic) algorithms adapted for the episodic setting.