CAST: Cross-Attention in Space and Time for Video Action Recognition

Open in new window