Long N-step Surrogate Stage Reward to Reduce Variances of Deep Reinforcement Learning in Complex Problems