Revealing the unseen: Benchmarking video action recognition under occlusion
–Neural Information Processing Systems
In this work, we study the effect of occlusion on video action recognition. Tofacilitate this study, we propose three benchmark datasets and experiment withseven different video action recognition models. These datasets include two synthetic benchmarks, UCF-101-O and K-400-O, which enabled understanding the effects of fundamental properties of occlusion via controlled experiments. We also propose a real-world occlusion dataset, UCF-101-Y-OCC, which helps in further validating the findings of this study. We find several interesting insights such as 1) transformers are more robust than CNN counterparts, 2) pretraining make modelsrobust against occlusions, and 3) augmentation helps, but does not generalize well to real-world occlusions.
Neural Information Processing Systems
Jan-19-2025, 22:48:02 GMT
- Genre:
- Research Report (1.00)
- Technology:
- Information Technology > Artificial Intelligence > Vision (0.96)