Meta-Personalizing Vision-Language Models to Find Named Instances in Video

Open in new window