Almost Horizon-Free Structure-Aware Best Policy Identification with a Generative Model

Andrea Zanette, Mykel J. Kochenderfer, Emma Brunskill

Neural Information Processing Systems 

Markov Decision Process (MDP) provided that we can access the reward and transition function through a generative model.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found