Enabling Detailed Action Recognition Evaluation Through Video Dataset Augmentation
–Neural Information Processing Systems
It is well-known in the video understanding community that human action recognition models suffer from background bias, i.e., over-relying on scene cues in making their predictions. However, it is difficult to quantify this effect using existing evaluation frameworks. We introduce the Human-centric Analysis Toolkit (HAT), which enables evaluation of learned background bias without the need for new manual video annotation. It does so by automatically generating synthetically manipulated videos and leveraging the recent advances in image segmentation and video inpainting. Using HAT we perform an extensive analysis of 74 action recognition models trained on the Kinetics dataset.
Neural Information Processing Systems
Jan-19-2025, 09:00:45 GMT
- Technology:
- Information Technology > Artificial Intelligence > Vision (1.00)