Stable deep reinforcement learning method by predicting uncertainty in rewards as a subtask