On shallow planning under partial observability

Jul-22-2024–arXiv.org Artificial Intelligence

Formulating a real-world problem under the Reinforcement Learning framework involves non-trivial design choices, such as selecting a discount factor for the learning objective (discounted cumulative rewards), which articulates the planning horizon of the agent. This work investigates the impact of the discount factor on the biasvariance trade-off given structural parameters of the underlying Markov Decision Process. Our results support the idea that a shorter planning horizon might be beneficial, especially under partial observability.

partial observability, planning horizon, variation, (15 more...)

arXiv.org Artificial Intelligence

Jul-22-2024

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- North America
  - Canada (0.04)
  - United States > Massachusetts
    - Middlesex County > Cambridge (0.04)

Genre:
- Research Report > New Finding (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.89)
    - Reinforcement Learning (1.00)
  - Representation & Reasoning (1.00)