Enhancing Subsequent Video Retrieval via Vision-Language Models (VLMs)