Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality

Open in new window