Cascaded Cross-Modal Transformer for Audio-Textual Classification

Open in new window