ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition
–Neural Information Processing Systems
Our world is full of varied actions and moves across specialized domains that we, as humans, strive to identify and understand. Within any single domain, actions can often appear quite similar, making it challenging for deep models to distinguish them accurately. To evaluate the effectiveness of multimodal foundation models in helping us recognize such actions, we present ActionAtlas v1.0, a multiple-choice video question-answering benchmark featuring short videos across various sports. Each video in the dataset is paired with a question and four or five choices. The question pinpoints specific individuals, asking which choice "best" describes their action within a certain temporal context.
Neural Information Processing Systems
Mar-27-2025, 15:38:43 GMT
- Country:
- North America > United States (0.14)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Education (0.66)
- Government (1.00)
- Information Technology (1.00)
- Law (0.92)
- Leisure & Entertainment > Sports
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Natural Language
- Chatbot (1.00)
- Large Language Model (1.00)
- Representation & Reasoning (0.92)
- Vision (1.00)
- Machine Learning > Neural Networks
- Communications > Social Media (1.00)
- Artificial Intelligence
- Information Technology