Leveraging Transformers for Weakly Supervised Object Localization in Unconstrained Videos

Open in new window