Goto

Collaborating Authors

 Education









ReincarnatingReinforcementLearning: ReusingPriorComputationtoAccelerateProgress

Neural Information Processing Systems

The vertical separators correspond to loading network weights and replay buffer for fine-tuning while offline pre-training on replay buffer using QDagger (Section 4.1) for reincarnation. Shaded regions show 95% confidence intervals.



TeachingviaBest-CaseCounterexamples intheLearning-with-Equivalence-QueriesParadigm

Neural Information Processing Systems

We establish new connections between LwEQ-TD and LfS-TD by studying LwEQ-TD for different learner models based on the richness of their query functions. We show that LwEQ-TD isthesameaswc-TD[18],RTD[22,24],andNCTD[27]forahypothesis class when restricting query functions to specific families.