Video Question Answering Using CLIP-Guided Visual-Text Attention

Open in new window