Revealing the unseen: Benchmarking video action recognition under occlusion