TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling

Open in new window