Non-stationary and Varying-discounting Markov Decision Processes for Reinforcement Learning