ASTRA: Aligning Speech and Text Representations for Asr without Sampling