Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition

Open in new window