Goto

Collaborating Authors

 Reinforcement Learning


ReincarnatingReinforcementLearning: ReusingPriorComputationtoAccelerateProgress

Neural Information Processing Systems

The vertical separators correspond to loading network weights and replay buffer for fine-tuning while offline pre-training on replay buffer using QDagger (Section 4.1) for reincarnation. Shaded regions show 95% confidence intervals.



HindsightCreditAssignment

Neural Information Processing Systems

A reinforcement learning (RL) agent is tasked with two fundamental, interdependent problems: exploration(howtodiscoverusefuldata),andcreditassignment(howtoincorporateit). The simplest way of estimating the value function is by averaging returns (futurediscountedsumsofrewards)startingfromtaking ainx.








e140dbab44e01e699491a59c9978b924-Paper.pdf

Neural Information Processing Systems

Success stories of deep reinforcement learning (RL) from high dimensional inputs such as pixels or large spatial layouts include achieving superhuman performance on Atari games [30, 37, 1], grandmaster levelinStarcraft II[50]andgrasping adiverse setofobjects with impressivesuccess rates and generalization with robots in the real world [21].