ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
–Neural Information Processing Systems
Cinematography, the fundamental visual language of film, is essential for conveying narrative, emotion, and aesthetic quality. While recent Vision-Language Models (VLMs) demonstrate strong general visual understanding, their proficiency in comprehending the nuanced cinematic grammar embedded within individual shots remains largely unexplored and lacks robust evaluation.
Neural Information Processing Systems
Jun-22-2026, 11:13:04 GMT
- Country:
- North America > United States (0.46)
- Asia > China (0.28)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.93)
- Research Report
- Industry:
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Representation & Reasoning (1.00)
- Natural Language
- Large Language Model (1.00)
- Chatbot (0.68)
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Information Technology > Artificial Intelligence