CALVIN: Improved Contextual Video Captioning via Instruction Tuning

Neural Information Processing Systems 

The recent emergence of powerful Vision-Language models (VLMs) has significantly improved image captioning.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found