Residual Policy Gradient: A Reward View of KL-regularized Objective

Open in new window