CALVIN: Improved Contextual Video Captioning via Instruction Tuning

Oct-10-2025, 12:42:36 GMT–Neural Information Processing Systems

The recent emergence of powerful Vision-Language models (VLMs) has significantly improved image captioning.

dataset, proceedings, video, (7 more...)

Neural Information Processing Systems

Oct-10-2025, 12:42:36 GMT

Conferences PDF

Country:
- North America > United States
  - Virginia (0.04)
  - Maryland > Prince George's County
    - College Park (0.04)
  - California > Los Angeles County
    - Los Angeles (0.04)
- Europe
  - Switzerland > Zürich
    - Zürich (0.14)
  - Netherlands > North Holland
    - Amsterdam (0.04)

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Education (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language > Large Language Model (0.96)
  - Machine Learning
    - Neural Networks (0.93)
    - Performance Analysis > Accuracy (0.46)

Duplicate Docs Excel Report

Title
CALVIN: Improved Contextual Video Captioning via Instruction Tuning

Similar Docs Excel Report more

Title	Similarity	Source
None found