Policy Optimization with Smooth Guidance Rewards Learned from Sparse-Reward Demonstrations

Open in new window