Average-Reward Learning and Planning with Options Yi Wan, Abhishek Naik, Richard S. Sutton {wan6,anaik1,rsutton }@ualberta.ca University of Alberta, Amii

Aug-17-2025, 04:34:57 GMT–Neural Information Processing Systems

We extend the options framework for temporal abstraction in reinforcement learning from discounted Markov decision processes (MDPs) to average-reward MDPs. Our contributions include general convergent off-policy inter-option learning algorithms, intra-option algorithms for learning values and models, as well as sample-based planning variants of our learning algorithms. Our algorithms and convergence proofs extend those recently developed by Wan, Naik, and Sutton.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Aug-17-2025, 04:34:57 GMT

Conferences PDF

Add feedback

Country:
- North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (0.94)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.49)

Duplicate Docs Excel Report

Title
c058f544c737782deacefa532d9add4c-Paper.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found