End-to-end Speech Recognition with similar length speech and text