Why Can't I Dance in the Mall? Learning to Mitigate Scene Bias in Action Recognition

Oct-10-2024, 17:27:48 GMT–Neural Information Processing Systems

Human activities often occur in specific scene contexts, e.g., playing basketball on a basketball court. The learned representation may not generalize well to new action classes or different tasks. In this paper, we propose to mitigate scene bias for video representation learning. Specifically, we augment the standard cross-entropy loss for action classification with 1) an adversarial loss for scene types and 2) a human mask confusion loss for videos where the human actors are masked out. These two losses encourage learning representations that are unable to predict the scene types and the correct actions when there is no evidence.

action recognition, learning, mitigate scene bias, (4 more...)

Neural Information Processing Systems

Oct-10-2024, 17:27:48 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Vision (0.40)