Reinforcement Learning in Switching Non-Stationary Markov Decision Processes: Algorithms and Convergence Analysis