On shallow planning under partial observability
Lefebvre, Randy, Durand, Audrey
–arXiv.org Artificial Intelligence
Formulating a real-world problem under the Reinforcement Learning framework involves non-trivial design choices, such as selecting a discount factor for the learning objective (discounted cumulative rewards), which articulates the planning horizon of the agent. This work investigates the impact of the discount factor on the biasvariance trade-off given structural parameters of the underlying Markov Decision Process. Our results support the idea that a shorter planning horizon might be beneficial, especially under partial observability.
arXiv.org Artificial Intelligence
Jul-22-2024
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America
- Canada (0.04)
- United States > Massachusetts
- Middlesex County > Cambridge (0.04)
- Europe > United Kingdom
- Genre:
- Research Report > New Finding (0.48)