Training and Inference Efficiency of Encoder-Decoder Speech Models