Latent Speech-Text Transformer