BoViLA: Bootstrapping Video-Language Alignment via LLM-Based Self-Questioning and Answering

Open in new window