Boosting Audio Visual Question Answering via Key Semantic-Aware Cues

Open in new window