Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender Systems