Zero-Shot End-to-End Spoken Language Understanding via Cross-Modal Selective Self-Training

Open in new window