Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization