Reinforcement Learning of Speech Recognition System Based on Policy Gradient and Hypothesis Selection