Reinforcement Learning
Average-Reward Learning and Planning with Options Yi Wan, Abhishek Naik, Richard S. Sutton {wan6,anaik1,rsutton }@ualberta.ca University of Alberta, Amii
We extend the options framework for temporal abstraction in reinforcement learning from discounted Markov decision processes (MDPs) to average-reward MDPs. Our contributions include general convergent off-policy inter-option learning algorithms, intra-option algorithms for learning values and models, as well as sample-based planning variants of our learning algorithms. Our algorithms and convergence proofs extend those recently developed by Wan, Naik, and Sutton.
Supplementary: Reinforcement Learning Enhanced Explainer for Graph Neural Networks Caihua Shan
(line 4). We show our RG-Explainer for graph classification in Alg. 2. The algorithm is similar to the one explaining node classifications, except that we train our seed locator to detect the most influential (line 4). Input: The input graph G = ( V, E), node features X, node instances I, and a trained GNN model f () . Check the stopping criteria by Eq. 10. I, and a trained GNN model f () .