Policy Optimization with Smooth Guidance Rewards Learned from Sparse-Reward Demonstrations