Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target

Open in new window