Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning