d01bda31bbcd780774ff15b534e03c40-Supplemental-Conference.pdf

Neural Information Processing Systems 

Reinforcement learning (RL) algorithms often require a large number of data samples to learn a control policy. As a result, training them directly on the real-world systems is expensive and potentially dangerous.