Meta-Personalizing Vision-Language Models to Find Named Instances in Video