Almost Horizon-Free Structure-Aware Best Policy Identification with a Generative Model
Andrea Zanette, Mykel J. Kochenderfer, Emma Brunskill
–Neural Information Processing Systems
Markov Decision Process (MDP) provided that we can access the reward and transition function through a generative model.
Neural Information Processing Systems
Aug-19-2025, 22:49:33 GMT
- Country:
- Europe > United Kingdom
- England
- Cambridgeshire > Cambridge (0.04)
- Greater London > London (0.04)
- England
- North America
- Canada (0.04)
- United States > California
- Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom