Goto

Collaborating Authors

 Reinforcement Learning







Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations

Neural Information Processing Systems

This is particularly challenging for high-dimensional control tasks, in which there may be a large number of factors that influence the agent's objective.



ProvablyGoodBatchReinforcementLearning WithoutGreatExploration

Neural Information Processing Systems

Thisisbecause, in the traditional analysis, the error bound scales up with this ratio. We show that using pessimistic value estimatesin the low-data regions in Bellman optimality and evaluation back-up can yield more adaptive and stronger guarantees when the concentrability assumption does not hold.



0ee633a6ade45eab4276352b3ee79c7a-Paper-Conference.pdf

Neural Information Processing Systems

A fundamental difference between our learning problem from standard RL problems is that the realized reward feedback from conversion incrementality ismixed and delayed.