MemVL T: Vision-Language Tracking with Adaptive Memory-based Prompts

Neural Information Processing Systems 

However, most existing vision-language trackers still overly rely on initial fixed multimodal prompts, which struggle to provide effective guidance for dynamically changing targets.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found