Pose Matters: Evaluating Vision Transformers and CNNs for Human Action Recognition on Small COCO Subsets

Open in new window