Reinforcement Learning
the value of generative adversarial training for model-based reinforcement learning (RL) with offline data, especially
First, we sincerely thank all reviewers for their thoughtful comments and suggestions. We will report the variance and statistical significance of our empirical results in our revision. These shed light on the approach's effectiveness as an online recommender. These two factors help control bias in value estimation for model-based RL. Please refer to Line 9-15 for our responses to possible new empirical evaluations.
sponse addressing one common point raised by Reviewer 1 and Reviewer 3 regarding how to handle the case where 2 null
We thank all the reviewers for their careful feedback and will revise our paper accordingly. Such a fact is presented in the classic paper "An analysis of temporal-difference learning with function Similar facts can be found for other TD algorithms (e.g. Reviewer 1 is correct in that a discount factor is needed. Now we address specific reviewer comments below. A reference for this is the classic paper "An Finally, the "-" sign in Line 213 is due to the Hurwtiz assumption.
A neurally plausible model learns successor representations in partially observable environments
However, it is not clear how such representations might be learned and computed in partially observed, noisy environments. Here, we introduce a neurally plausible model using distributional successor features, which builds on the distributed distributional code for the representation and computation of uncertainty, and which allows for efficient value function computation in partially observed environments via the successor representation.