CALVIN: Improved Contextual Video Captioning via Instruction Tuning
–Neural Information Processing Systems
The recent emergence of powerful Vision-Language models (VLMs) has significantly improved image captioning.
Neural Information Processing Systems
Oct-10-2025, 12:42:36 GMT
- Country:
- North America > United States
- Virginia (0.04)
- Maryland > Prince George's County
- College Park (0.04)
- California > Los Angeles County
- Los Angeles (0.04)
- Europe
- Switzerland > Zürich
- Zürich (0.14)
- Netherlands > North Holland
- Amsterdam (0.04)
- Switzerland > Zürich
- North America > United States
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Education (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Natural Language > Large Language Model (0.96)
- Machine Learning
- Neural Networks (0.93)
- Performance Analysis > Accuracy (0.46)
- Information Technology > Artificial Intelligence