Improved Training Mechanism for Reinforcement Learning via Online Model Selection