Efficient Sequence Transduction by Jointly Predicting Tokens and Durations