Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization

Open in new window