Entropic Risk Optimization in Discounted MDPs: Sample Complexity Bounds with a Generative Model

Open in new window