Goto

Collaborating Authors

 Reinforcement Learning




Reviewer 1: Q1: I wonder if their analysis tricks of AC/NAC when applied to PG methods improve their guarantees

Neural Information Processing Systems

If their analysis tricks do improve PG guarantees, how does it compare then? Reviewer 2: Q1: It would be interesting to complement the theoretical results with empirical results in toy problem. We are working on experiments and will add these results to the revision. Q2: For the error term that disappears with a larger mini-batch (line 211). A2: Y es, this error term should be called as variance error.


Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards

Neural Information Processing Systems

Recent work demonstrated that using a memory buffer of previous successful trajectories can result in more effective policies. However, existing methods may overly exploit past successful experiences, which can encourage the agent to adopt sub-optimal and myopic behaviors.