Learning Guidance Rewards with Trajectory-space Smoothing