CAST: Cross-Attention in Space and Time for Video Action Recognition

Neural Information Processing Systems 

Recognizing human actions in videos requires spatial and temporal understanding. Most existing action recognition models lack a balanced spatio-temporal understanding of videos.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found