CAST: Cross-Attention in Space and Time for Video Action Recognition
–Neural Information Processing Systems
Recognizing human actions in videos requires spatial and temporal understanding. Most existing action recognition models lack a balanced spatio-temporal understanding of videos.
Neural Information Processing Systems
Feb-18-2026, 02:21:05 GMT