Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target