Reformulating Zero-shot Action Recognition for Multi-label Actions

Jan-19-2025, 08:37:13 GMT–Neural Information Processing Systems

The goal of zero-shot action recognition (ZSAR) is to classify action classes which were not previously seen during training. Traditionally, this is achieved by training a network to map, or regress, visual inputs to a semantic space where a nearest neighbor classifier is used to select the closest target class. We argue that this approach is sub-optimal due to the use of nearest neighbor on static semantic space and is ineffective when faced with multi-label videos - where two semantically distinct co-occurring action categories cannot be predicted with high confidence. To overcome these limitations, we propose a ZSAR framework which does not rely on nearest neighbor classification, but rather consists of a pairwise scoring function. Given a video and a set of action classes, our method predicts a set of confidence scores for each class independently.

dataset, multi-label action, reformulating zero-shot action recognition, (1 more...)

Neural Information Processing Systems

Jan-19-2025, 08:37:13 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Vision (0.99)
  - Natural Language > Large Language Model (0.65)