Bayesian Hierarchical Reinforcement Learning

Mar-14-2024, 08:59:39 GMT–Neural Information Processing Systems

We define priors on the primitive environment model and on task pseudo-rewards. Since models for composite tasks can be complex, we use a mixed model-based/model-free learning approach to find an optimal hierarchical policy. We show empirically that (i) our approach results in improved convergence over non-Bayesian baselines, (ii) using both task hierarchies and Bayesian priors is better than either alone, (iii) taking advantage of the task hierarchy reduces the computational cost of Bayesian reinforcement learning and (iv) in this framework, task pseudo-rewards can be learned instead of being manually specified, leading to hierarchically optimal rather than recursively optimal policies.

maxq, optimal policy, subtask, (14 more...)

Neural Information Processing Systems

Mar-14-2024, 08:59:39 GMT

Conferences PDF

Add feedback

Country:
- North America > United States
  - Ohio > Cuyahoga County
    - Cleveland (0.04)
  - New York > New York County
    - New York City (0.04)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.97)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (1.00)