Improving End-to-End Speech Recognition with Policy Learning