Transcribing and Translating, Fast and Slow: Joint Speech Translation and Recognition