Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Question Answering

Open in new window