A Long N -step Surrogate Stage Reward for Deep Reinforcement Learning