Toward Joint Language Modeling for Speech Units and Text

Open in new window