Safe Reinforcement Learning for Constrained Markov Decision Processes with Stochastic Stopping Time